The spatial prediction sandbox - Investigating the use of spatially-explicit modelling and cross-validation strategies in spatial interpolation machine learning problems

Detalhes bibliográficos
Autor(a) principal: Milà Garcia, Carles
Data de Publicação: 2021
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10362/113881
Resumo: Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial Technologies
id RCAP_afc7a5ce93b0507ee55799eaf3568842
oai_identifier_str oai:run.unl.pt:10362/113881
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling The spatial prediction sandbox - Investigating the use of spatially-explicit modelling and cross-validation strategies in spatial interpolation machine learning problemsMachine learning methodsNearest Distance MatchingRandom ForestDissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial TechnologiesMachine Learning (ML) methods are increasingly used for spatial interpolation and di erent strategies have been proposed to introduce space into the modelling and validation phases. Nevertheless, a comparison of these methods under di erent landscape autocorrelation ranges and sampling designs is still missing. This Master Thesis investigates under which scenarios spatially-explicit ML modelling and validation strategies are appropriate for spatial interpolation problems. We designed a framework that allowed us to simulate predictor and outcome spatial elds with di erent autocorrelation ranges, as well as samples with di erent number of points and distributions. With these data, we tested di erent non-spatial and spatially-explicit (coordinates, EDF, RFsp) Random Forest ML models and evaluated them using the simulated surfaces as well as di erent standard (Leave-One- Out, LOO) and spatially-explicit (spatial bu er LOO, sbLOO) Cross-Validation (CV) strategies. We developed a new method called Nearest Distance Matching (NDM) to estimate the appropriate radius for sbLOO CV for spatial interpolation based on sample distribution and landscape range, and compared it to state-of-the art methods for radius search, only based on range. While for short ranges non-spatial models were superior to spatially-explicit models regardless of the sample size and distribution; for long ranges, spatial models performed better under regular and random sampling designs, but not clustered and non-uniform. CV results indicated that although LOO correctly estimated model performance under random designs, it yielded overestimated errors for regular samples and underestimated errors for clustered and non-uniform designs under long ranges. Results of sbLOO combined with NDM correctly addressed error underestimation of LOO in clustered and non-uniform samples, whereas sbLOO based solely on the range resulted in error overestimation for all designs under long ranges. This Master Thesis provides important insights to the eld of predictive mapping: it elucidates in which cases spatially-explicit methods may be preferred, and establishes that state-of-the-art approaches for spatial CV designed to assess model transferability are not suited for spatial interpolation and proposes an alternative.Meyer, HannaPebesma, EdzerMateu Mahiques, JorgeRUNMilà Garcia, Carles2021-03-15T08:48:56Z2021-01-292021-01-29T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/113881TID:202672816enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T04:56:40Zoai:run.unl.pt:10362/113881Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:42:23.572128Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv The spatial prediction sandbox - Investigating the use of spatially-explicit modelling and cross-validation strategies in spatial interpolation machine learning problems
title The spatial prediction sandbox - Investigating the use of spatially-explicit modelling and cross-validation strategies in spatial interpolation machine learning problems
spellingShingle The spatial prediction sandbox - Investigating the use of spatially-explicit modelling and cross-validation strategies in spatial interpolation machine learning problems
Milà Garcia, Carles
Machine learning methods
Nearest Distance Matching
Random Forest
title_short The spatial prediction sandbox - Investigating the use of spatially-explicit modelling and cross-validation strategies in spatial interpolation machine learning problems
title_full The spatial prediction sandbox - Investigating the use of spatially-explicit modelling and cross-validation strategies in spatial interpolation machine learning problems
title_fullStr The spatial prediction sandbox - Investigating the use of spatially-explicit modelling and cross-validation strategies in spatial interpolation machine learning problems
title_full_unstemmed The spatial prediction sandbox - Investigating the use of spatially-explicit modelling and cross-validation strategies in spatial interpolation machine learning problems
title_sort The spatial prediction sandbox - Investigating the use of spatially-explicit modelling and cross-validation strategies in spatial interpolation machine learning problems
author Milà Garcia, Carles
author_facet Milà Garcia, Carles
author_role author
dc.contributor.none.fl_str_mv Meyer, Hanna
Pebesma, Edzer
Mateu Mahiques, Jorge
RUN
dc.contributor.author.fl_str_mv Milà Garcia, Carles
dc.subject.por.fl_str_mv Machine learning methods
Nearest Distance Matching
Random Forest
topic Machine learning methods
Nearest Distance Matching
Random Forest
description Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial Technologies
publishDate 2021
dc.date.none.fl_str_mv 2021-03-15T08:48:56Z
2021-01-29
2021-01-29T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/113881
TID:202672816
url http://hdl.handle.net/10362/113881
identifier_str_mv TID:202672816
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799138035569262592