Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment
Autor(a) principal: | |
---|---|
Data de Publicação: | 2019 |
Outros Autores: | , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Scientia Agrícola (Online) |
Texto Completo: | https://www.revistas.usp.br/sa/article/view/156957 |
Resumo: | Different uses of soil legacy data such as training dataset as well as the selection of soil environmental covariables could drive the accuracy of machine learning techniques. Thus, this study evaluated the ability of the Random Forest algorithm to predict soil classes from different training datasets and extrapolate such information to a similar area. The following training datasets were extracted from legacy data: a) point data composed of 53 soil samples; b) 30 m buffer around the soil samples, and soil map polygons excluding: c) 20 m; and d) 30 m from the boundaries of polygons. These four datasets were submitted to principal component analysis (PCA) to reduce multidimensionality. Each dataset derived a new one. Different combinations of predictor variables were tested. A total of 52 models were evaluated by means of error of models, prediction uncertainty and external validation for overall accuracy and Kappa index. The best result was obtained by reducing the number of predictors with the PCA along with information from the buffer around the points. Although Random Forest has been considered a robust spatial predictor model, it was clear it is sensitive to different strategies of selecting training dataset. Effort was necessary to find the best training dataset for achieving a suitable level of accuracy of spatial prediction. To identify a specific dataset seems to be better than using a great number of variables or a large volume of training data. The efforts made allowed for the accurate acquisition of a mapped area 15.5 times larger than the reference area. |
id |
USP-18_9e535f6d14aed7d135166e868f0edf39 |
---|---|
oai_identifier_str |
oai:revistas.usp.br:article/156957 |
network_acronym_str |
USP-18 |
network_name_str |
Scientia Agrícola (Online) |
repository_id_str |
|
spelling |
Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessmentdigital soil mappingsoil surveylegacy dataDifferent uses of soil legacy data such as training dataset as well as the selection of soil environmental covariables could drive the accuracy of machine learning techniques. Thus, this study evaluated the ability of the Random Forest algorithm to predict soil classes from different training datasets and extrapolate such information to a similar area. The following training datasets were extracted from legacy data: a) point data composed of 53 soil samples; b) 30 m buffer around the soil samples, and soil map polygons excluding: c) 20 m; and d) 30 m from the boundaries of polygons. These four datasets were submitted to principal component analysis (PCA) to reduce multidimensionality. Each dataset derived a new one. Different combinations of predictor variables were tested. A total of 52 models were evaluated by means of error of models, prediction uncertainty and external validation for overall accuracy and Kappa index. The best result was obtained by reducing the number of predictors with the PCA along with information from the buffer around the points. Although Random Forest has been considered a robust spatial predictor model, it was clear it is sensitive to different strategies of selecting training dataset. Effort was necessary to find the best training dataset for achieving a suitable level of accuracy of spatial prediction. To identify a specific dataset seems to be better than using a great number of variables or a large volume of training data. The efforts made allowed for the accurate acquisition of a mapped area 15.5 times larger than the reference area.Universidade de São Paulo. Escola Superior de Agricultura Luiz de Queiroz2019-04-16info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionapplication/pdfhttps://www.revistas.usp.br/sa/article/view/15695710.1590/1678-992x-2017-0300Scientia Agricola; v. 76 n. 3 (2019); 243-254Scientia Agricola; Vol. 76 Núm. 3 (2019); 243-254Scientia Agricola; Vol. 76 No. 3 (2019); 243-2541678-992X0103-9016reponame:Scientia Agrícola (Online)instname:Universidade de São Paulo (USP)instacron:USPenghttps://www.revistas.usp.br/sa/article/view/156957/152368Copyright (c) 2019 Scientia Agricolainfo:eu-repo/semantics/openAccessMachado, Diego Fernandes TerraSilva, Sérgio Henrique GodinhoCuri, NiltonMenezes, Michele Duarte de2019-04-17T17:26:30Zoai:revistas.usp.br:article/156957Revistahttp://revistas.usp.br/sa/indexPUBhttps://old.scielo.br/oai/scielo-oai.phpscientia@usp.br||alleoni@usp.br1678-992X0103-9016opendoar:2019-04-17T17:26:30Scientia Agrícola (Online) - Universidade de São Paulo (USP)false |
dc.title.none.fl_str_mv |
Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment |
title |
Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment |
spellingShingle |
Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment Machado, Diego Fernandes Terra digital soil mapping soil survey legacy data |
title_short |
Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment |
title_full |
Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment |
title_fullStr |
Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment |
title_full_unstemmed |
Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment |
title_sort |
Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment |
author |
Machado, Diego Fernandes Terra |
author_facet |
Machado, Diego Fernandes Terra Silva, Sérgio Henrique Godinho Curi, Nilton Menezes, Michele Duarte de |
author_role |
author |
author2 |
Silva, Sérgio Henrique Godinho Curi, Nilton Menezes, Michele Duarte de |
author2_role |
author author author |
dc.contributor.author.fl_str_mv |
Machado, Diego Fernandes Terra Silva, Sérgio Henrique Godinho Curi, Nilton Menezes, Michele Duarte de |
dc.subject.por.fl_str_mv |
digital soil mapping soil survey legacy data |
topic |
digital soil mapping soil survey legacy data |
description |
Different uses of soil legacy data such as training dataset as well as the selection of soil environmental covariables could drive the accuracy of machine learning techniques. Thus, this study evaluated the ability of the Random Forest algorithm to predict soil classes from different training datasets and extrapolate such information to a similar area. The following training datasets were extracted from legacy data: a) point data composed of 53 soil samples; b) 30 m buffer around the soil samples, and soil map polygons excluding: c) 20 m; and d) 30 m from the boundaries of polygons. These four datasets were submitted to principal component analysis (PCA) to reduce multidimensionality. Each dataset derived a new one. Different combinations of predictor variables were tested. A total of 52 models were evaluated by means of error of models, prediction uncertainty and external validation for overall accuracy and Kappa index. The best result was obtained by reducing the number of predictors with the PCA along with information from the buffer around the points. Although Random Forest has been considered a robust spatial predictor model, it was clear it is sensitive to different strategies of selecting training dataset. Effort was necessary to find the best training dataset for achieving a suitable level of accuracy of spatial prediction. To identify a specific dataset seems to be better than using a great number of variables or a large volume of training data. The efforts made allowed for the accurate acquisition of a mapped area 15.5 times larger than the reference area. |
publishDate |
2019 |
dc.date.none.fl_str_mv |
2019-04-16 |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://www.revistas.usp.br/sa/article/view/156957 10.1590/1678-992x-2017-0300 |
url |
https://www.revistas.usp.br/sa/article/view/156957 |
identifier_str_mv |
10.1590/1678-992x-2017-0300 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
https://www.revistas.usp.br/sa/article/view/156957/152368 |
dc.rights.driver.fl_str_mv |
Copyright (c) 2019 Scientia Agricola info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Copyright (c) 2019 Scientia Agricola |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Universidade de São Paulo. Escola Superior de Agricultura Luiz de Queiroz |
publisher.none.fl_str_mv |
Universidade de São Paulo. Escola Superior de Agricultura Luiz de Queiroz |
dc.source.none.fl_str_mv |
Scientia Agricola; v. 76 n. 3 (2019); 243-254 Scientia Agricola; Vol. 76 Núm. 3 (2019); 243-254 Scientia Agricola; Vol. 76 No. 3 (2019); 243-254 1678-992X 0103-9016 reponame:Scientia Agrícola (Online) instname:Universidade de São Paulo (USP) instacron:USP |
instname_str |
Universidade de São Paulo (USP) |
instacron_str |
USP |
institution |
USP |
reponame_str |
Scientia Agrícola (Online) |
collection |
Scientia Agrícola (Online) |
repository.name.fl_str_mv |
Scientia Agrícola (Online) - Universidade de São Paulo (USP) |
repository.mail.fl_str_mv |
scientia@usp.br||alleoni@usp.br |
_version_ |
1800222793958162432 |