The spatial prediction sandbox - Investigating the use of spatially-explicit modelling and cross-validation strategies in spatial interpolation machine learning problems
Autor(a) principal: | |
---|---|
Data de Publicação: | 2021 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10362/113881 |
Resumo: | Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial Technologies |
id |
RCAP_afc7a5ce93b0507ee55799eaf3568842 |
---|---|
oai_identifier_str |
oai:run.unl.pt:10362/113881 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
The spatial prediction sandbox - Investigating the use of spatially-explicit modelling and cross-validation strategies in spatial interpolation machine learning problemsMachine learning methodsNearest Distance MatchingRandom ForestDissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial TechnologiesMachine Learning (ML) methods are increasingly used for spatial interpolation and di erent strategies have been proposed to introduce space into the modelling and validation phases. Nevertheless, a comparison of these methods under di erent landscape autocorrelation ranges and sampling designs is still missing. This Master Thesis investigates under which scenarios spatially-explicit ML modelling and validation strategies are appropriate for spatial interpolation problems. We designed a framework that allowed us to simulate predictor and outcome spatial elds with di erent autocorrelation ranges, as well as samples with di erent number of points and distributions. With these data, we tested di erent non-spatial and spatially-explicit (coordinates, EDF, RFsp) Random Forest ML models and evaluated them using the simulated surfaces as well as di erent standard (Leave-One- Out, LOO) and spatially-explicit (spatial bu er LOO, sbLOO) Cross-Validation (CV) strategies. We developed a new method called Nearest Distance Matching (NDM) to estimate the appropriate radius for sbLOO CV for spatial interpolation based on sample distribution and landscape range, and compared it to state-of-the art methods for radius search, only based on range. While for short ranges non-spatial models were superior to spatially-explicit models regardless of the sample size and distribution; for long ranges, spatial models performed better under regular and random sampling designs, but not clustered and non-uniform. CV results indicated that although LOO correctly estimated model performance under random designs, it yielded overestimated errors for regular samples and underestimated errors for clustered and non-uniform designs under long ranges. Results of sbLOO combined with NDM correctly addressed error underestimation of LOO in clustered and non-uniform samples, whereas sbLOO based solely on the range resulted in error overestimation for all designs under long ranges. This Master Thesis provides important insights to the eld of predictive mapping: it elucidates in which cases spatially-explicit methods may be preferred, and establishes that state-of-the-art approaches for spatial CV designed to assess model transferability are not suited for spatial interpolation and proposes an alternative.Meyer, HannaPebesma, EdzerMateu Mahiques, JorgeRUNMilà Garcia, Carles2021-03-15T08:48:56Z2021-01-292021-01-29T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/113881TID:202672816enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T04:56:40Zoai:run.unl.pt:10362/113881Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:42:23.572128Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
The spatial prediction sandbox - Investigating the use of spatially-explicit modelling and cross-validation strategies in spatial interpolation machine learning problems |
title |
The spatial prediction sandbox - Investigating the use of spatially-explicit modelling and cross-validation strategies in spatial interpolation machine learning problems |
spellingShingle |
The spatial prediction sandbox - Investigating the use of spatially-explicit modelling and cross-validation strategies in spatial interpolation machine learning problems Milà Garcia, Carles Machine learning methods Nearest Distance Matching Random Forest |
title_short |
The spatial prediction sandbox - Investigating the use of spatially-explicit modelling and cross-validation strategies in spatial interpolation machine learning problems |
title_full |
The spatial prediction sandbox - Investigating the use of spatially-explicit modelling and cross-validation strategies in spatial interpolation machine learning problems |
title_fullStr |
The spatial prediction sandbox - Investigating the use of spatially-explicit modelling and cross-validation strategies in spatial interpolation machine learning problems |
title_full_unstemmed |
The spatial prediction sandbox - Investigating the use of spatially-explicit modelling and cross-validation strategies in spatial interpolation machine learning problems |
title_sort |
The spatial prediction sandbox - Investigating the use of spatially-explicit modelling and cross-validation strategies in spatial interpolation machine learning problems |
author |
Milà Garcia, Carles |
author_facet |
Milà Garcia, Carles |
author_role |
author |
dc.contributor.none.fl_str_mv |
Meyer, Hanna Pebesma, Edzer Mateu Mahiques, Jorge RUN |
dc.contributor.author.fl_str_mv |
Milà Garcia, Carles |
dc.subject.por.fl_str_mv |
Machine learning methods Nearest Distance Matching Random Forest |
topic |
Machine learning methods Nearest Distance Matching Random Forest |
description |
Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial Technologies |
publishDate |
2021 |
dc.date.none.fl_str_mv |
2021-03-15T08:48:56Z 2021-01-29 2021-01-29T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10362/113881 TID:202672816 |
url |
http://hdl.handle.net/10362/113881 |
identifier_str_mv |
TID:202672816 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799138035569262592 |