Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment

Detalhes bibliográficos
Autor(a) principal: Machado, Diego Fernandes Terra
Data de Publicação: 2019
Outros Autores: Silva, Sérgio Henrique Godinho, Curi, Nilton, Menezes, Michele Duarte de
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Scientia Agrícola (Online)
Texto Completo: https://www.revistas.usp.br/sa/article/view/156957
Resumo: Different uses of soil legacy data such as training dataset as well as the selection of soil environmental covariables could drive the accuracy of machine learning techniques. Thus, this study evaluated the ability of the Random Forest algorithm to predict soil classes from different training datasets and extrapolate such information to a similar area. The following training datasets were extracted from legacy data: a) point data composed of 53 soil samples; b) 30 m buffer around the soil samples, and soil map polygons excluding: c) 20 m; and d) 30 m from the boundaries of polygons. These four datasets were submitted to principal component analysis (PCA) to reduce multidimensionality. Each dataset derived a new one. Different combinations of predictor variables were tested. A total of 52 models were evaluated by means of error of models, prediction uncertainty and external validation for overall accuracy and Kappa index. The best result was obtained by reducing the number of predictors with the PCA along with information from the buffer around the points. Although Random Forest has been considered a robust spatial predictor model, it was clear it is sensitive to different strategies of selecting training dataset. Effort was necessary to find the best training dataset for achieving a suitable level of accuracy of spatial prediction. To identify a specific dataset seems to be better than using a great number of variables or a large volume of training data. The efforts made allowed for the accurate acquisition of a mapped area 15.5 times larger than the reference area.
id USP-18_9e535f6d14aed7d135166e868f0edf39
oai_identifier_str oai:revistas.usp.br:article/156957
network_acronym_str USP-18
network_name_str Scientia Agrícola (Online)
repository_id_str
spelling Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessmentdigital soil mappingsoil surveylegacy dataDifferent uses of soil legacy data such as training dataset as well as the selection of soil environmental covariables could drive the accuracy of machine learning techniques. Thus, this study evaluated the ability of the Random Forest algorithm to predict soil classes from different training datasets and extrapolate such information to a similar area. The following training datasets were extracted from legacy data: a) point data composed of 53 soil samples; b) 30 m buffer around the soil samples, and soil map polygons excluding: c) 20 m; and d) 30 m from the boundaries of polygons. These four datasets were submitted to principal component analysis (PCA) to reduce multidimensionality. Each dataset derived a new one. Different combinations of predictor variables were tested. A total of 52 models were evaluated by means of error of models, prediction uncertainty and external validation for overall accuracy and Kappa index. The best result was obtained by reducing the number of predictors with the PCA along with information from the buffer around the points. Although Random Forest has been considered a robust spatial predictor model, it was clear it is sensitive to different strategies of selecting training dataset. Effort was necessary to find the best training dataset for achieving a suitable level of accuracy of spatial prediction. To identify a specific dataset seems to be better than using a great number of variables or a large volume of training data. The efforts made allowed for the accurate acquisition of a mapped area 15.5 times larger than the reference area.Universidade de São Paulo. Escola Superior de Agricultura Luiz de Queiroz2019-04-16info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionapplication/pdfhttps://www.revistas.usp.br/sa/article/view/15695710.1590/1678-992x-2017-0300Scientia Agricola; v. 76 n. 3 (2019); 243-254Scientia Agricola; Vol. 76 Núm. 3 (2019); 243-254Scientia Agricola; Vol. 76 No. 3 (2019); 243-2541678-992X0103-9016reponame:Scientia Agrícola (Online)instname:Universidade de São Paulo (USP)instacron:USPenghttps://www.revistas.usp.br/sa/article/view/156957/152368Copyright (c) 2019 Scientia Agricolainfo:eu-repo/semantics/openAccessMachado, Diego Fernandes TerraSilva, Sérgio Henrique GodinhoCuri, NiltonMenezes, Michele Duarte de2019-04-17T17:26:30Zoai:revistas.usp.br:article/156957Revistahttp://revistas.usp.br/sa/indexPUBhttps://old.scielo.br/oai/scielo-oai.phpscientia@usp.br||alleoni@usp.br1678-992X0103-9016opendoar:2019-04-17T17:26:30Scientia Agrícola (Online) - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment
title Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment
spellingShingle Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment
Machado, Diego Fernandes Terra
digital soil mapping
soil survey
legacy data
title_short Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment
title_full Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment
title_fullStr Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment
title_full_unstemmed Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment
title_sort Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment
author Machado, Diego Fernandes Terra
author_facet Machado, Diego Fernandes Terra
Silva, Sérgio Henrique Godinho
Curi, Nilton
Menezes, Michele Duarte de
author_role author
author2 Silva, Sérgio Henrique Godinho
Curi, Nilton
Menezes, Michele Duarte de
author2_role author
author
author
dc.contributor.author.fl_str_mv Machado, Diego Fernandes Terra
Silva, Sérgio Henrique Godinho
Curi, Nilton
Menezes, Michele Duarte de
dc.subject.por.fl_str_mv digital soil mapping
soil survey
legacy data
topic digital soil mapping
soil survey
legacy data
description Different uses of soil legacy data such as training dataset as well as the selection of soil environmental covariables could drive the accuracy of machine learning techniques. Thus, this study evaluated the ability of the Random Forest algorithm to predict soil classes from different training datasets and extrapolate such information to a similar area. The following training datasets were extracted from legacy data: a) point data composed of 53 soil samples; b) 30 m buffer around the soil samples, and soil map polygons excluding: c) 20 m; and d) 30 m from the boundaries of polygons. These four datasets were submitted to principal component analysis (PCA) to reduce multidimensionality. Each dataset derived a new one. Different combinations of predictor variables were tested. A total of 52 models were evaluated by means of error of models, prediction uncertainty and external validation for overall accuracy and Kappa index. The best result was obtained by reducing the number of predictors with the PCA along with information from the buffer around the points. Although Random Forest has been considered a robust spatial predictor model, it was clear it is sensitive to different strategies of selecting training dataset. Effort was necessary to find the best training dataset for achieving a suitable level of accuracy of spatial prediction. To identify a specific dataset seems to be better than using a great number of variables or a large volume of training data. The efforts made allowed for the accurate acquisition of a mapped area 15.5 times larger than the reference area.
publishDate 2019
dc.date.none.fl_str_mv 2019-04-16
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://www.revistas.usp.br/sa/article/view/156957
10.1590/1678-992x-2017-0300
url https://www.revistas.usp.br/sa/article/view/156957
identifier_str_mv 10.1590/1678-992x-2017-0300
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv https://www.revistas.usp.br/sa/article/view/156957/152368
dc.rights.driver.fl_str_mv Copyright (c) 2019 Scientia Agricola
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Copyright (c) 2019 Scientia Agricola
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Universidade de São Paulo. Escola Superior de Agricultura Luiz de Queiroz
publisher.none.fl_str_mv Universidade de São Paulo. Escola Superior de Agricultura Luiz de Queiroz
dc.source.none.fl_str_mv Scientia Agricola; v. 76 n. 3 (2019); 243-254
Scientia Agricola; Vol. 76 Núm. 3 (2019); 243-254
Scientia Agricola; Vol. 76 No. 3 (2019); 243-254
1678-992X
0103-9016
reponame:Scientia Agrícola (Online)
instname:Universidade de São Paulo (USP)
instacron:USP
instname_str Universidade de São Paulo (USP)
instacron_str USP
institution USP
reponame_str Scientia Agrícola (Online)
collection Scientia Agrícola (Online)
repository.name.fl_str_mv Scientia Agrícola (Online) - Universidade de São Paulo (USP)
repository.mail.fl_str_mv scientia@usp.br||alleoni@usp.br
_version_ 1800222793958162432