Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment

Machado, Diego Fernandes Terra; Silva, Sérgio Henrique Godinho; Curi, Nilton; Menezes, Michele Duarte de

Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment

Detalhes bibliográficos
Autor(a) principal:	Machado, Diego Fernandes Terra
Data de Publicação:	2019
Outros Autores:	Silva, Sérgio Henrique Godinho, Curi, Nilton, Menezes, Michele Duarte de
Tipo de documento:	Artigo
Idioma:	eng
Título da fonte:	Scientia Agrícola (Online)
Texto Completo:	https://www.revistas.usp.br/sa/article/view/156957
Resumo:	Different uses of soil legacy data such as training dataset as well as the selection of soil environmental covariables could drive the accuracy of machine learning techniques. Thus, this study evaluated the ability of the Random Forest algorithm to predict soil classes from different training datasets and extrapolate such information to a similar area. The following training datasets were extracted from legacy data: a) point data composed of 53 soil samples; b) 30 m buffer around the soil samples, and soil map polygons excluding: c) 20 m; and d) 30 m from the boundaries of polygons. These four datasets were submitted to principal component analysis (PCA) to reduce multidimensionality. Each dataset derived a new one. Different combinations of predictor variables were tested. A total of 52 models were evaluated by means of error of models, prediction uncertainty and external validation for overall accuracy and Kappa index. The best result was obtained by reducing the number of predictors with the PCA along with information from the buffer around the points. Although Random Forest has been considered a robust spatial predictor model, it was clear it is sensitive to different strategies of selecting training dataset. Effort was necessary to find the best training dataset for achieving a suitable level of accuracy of spatial prediction. To identify a specific dataset seems to be better than using a great number of variables or a large volume of training data. The efforts made allowed for the accurate acquisition of a mapped area 15.5 times larger than the reference area.

Metadados do item

id	USP-18_9e535f6d14aed7d135166e868f0edf39
oai_identifier_str	oai:revistas.usp.br:article/156957
network_acronym_str	USP-18
network_name_str	Scientia Agrícola (Online)
repository_id_str
spelling	Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessmentdigital soil mappingsoil surveylegacy dataDifferent uses of soil legacy data such as training dataset as well as the selection of soil environmental covariables could drive the accuracy of machine learning techniques. Thus, this study evaluated the ability of the Random Forest algorithm to predict soil classes from different training datasets and extrapolate such information to a similar area. The following training datasets were extracted from legacy data: a) point data composed of 53 soil samples; b) 30 m buffer around the soil samples, and soil map polygons excluding: c) 20 m; and d) 30 m from the boundaries of polygons. These four datasets were submitted to principal component analysis (PCA) to reduce multidimensionality. Each dataset derived a new one. Different combinations of predictor variables were tested. A total of 52 models were evaluated by means of error of models, prediction uncertainty and external validation for overall accuracy and Kappa index. The best result was obtained by reducing the number of predictors with the PCA along with information from the buffer around the points. Although Random Forest has been considered a robust spatial predictor model, it was clear it is sensitive to different strategies of selecting training dataset. Effort was necessary to find the best training dataset for achieving a suitable level of accuracy of spatial prediction. To identify a specific dataset seems to be better than using a great number of variables or a large volume of training data. The efforts made allowed for the accurate acquisition of a mapped area 15.5 times larger than the reference area.Universidade de São Paulo. Escola Superior de Agricultura Luiz de Queiroz2019-04-16info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionapplication/pdfhttps://www.revistas.usp.br/sa/article/view/15695710.1590/1678-992x-2017-0300Scientia Agricola; v. 76 n. 3 (2019); 243-254Scientia Agricola; Vol. 76 Núm. 3 (2019); 243-254Scientia Agricola; Vol. 76 No. 3 (2019); 243-2541678-992X0103-9016reponame:Scientia Agrícola (Online)instname:Universidade de São Paulo (USP)instacron:USPenghttps://www.revistas.usp.br/sa/article/view/156957/152368Copyright (c) 2019 Scientia Agricolainfo:eu-repo/semantics/openAccessMachado, Diego Fernandes TerraSilva, Sérgio Henrique GodinhoCuri, NiltonMenezes, Michele Duarte de2019-04-17T17:26:30Zoai:revistas.usp.br:article/156957Revistahttp://revistas.usp.br/sa/indexPUBhttps://old.scielo.br/oai/scielo-oai.phpscientia@usp.br\|\|alleoni@usp.br1678-992X0103-9016opendoar:2019-04-17T17:26:30Scientia Agrícola (Online) - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv	Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment
title	Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment
spellingShingle	Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment Machado, Diego Fernandes Terra digital soil mapping soil survey legacy data
title_short	Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment
title_full	Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment
title_fullStr	Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment
title_full_unstemmed	Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment
title_sort	Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment
author	Machado, Diego Fernandes Terra
author_facet	Machado, Diego Fernandes Terra Silva, Sérgio Henrique Godinho Curi, Nilton Menezes, Michele Duarte de
author_role	author
author2	Silva, Sérgio Henrique Godinho Curi, Nilton Menezes, Michele Duarte de
author2_role	author author author
dc.contributor.author.fl_str_mv	Machado, Diego Fernandes Terra Silva, Sérgio Henrique Godinho Curi, Nilton Menezes, Michele Duarte de
dc.subject.por.fl_str_mv	digital soil mapping soil survey legacy data
topic	digital soil mapping soil survey legacy data
description	Different uses of soil legacy data such as training dataset as well as the selection of soil environmental covariables could drive the accuracy of machine learning techniques. Thus, this study evaluated the ability of the Random Forest algorithm to predict soil classes from different training datasets and extrapolate such information to a similar area. The following training datasets were extracted from legacy data: a) point data composed of 53 soil samples; b) 30 m buffer around the soil samples, and soil map polygons excluding: c) 20 m; and d) 30 m from the boundaries of polygons. These four datasets were submitted to principal component analysis (PCA) to reduce multidimensionality. Each dataset derived a new one. Different combinations of predictor variables were tested. A total of 52 models were evaluated by means of error of models, prediction uncertainty and external validation for overall accuracy and Kappa index. The best result was obtained by reducing the number of predictors with the PCA along with information from the buffer around the points. Although Random Forest has been considered a robust spatial predictor model, it was clear it is sensitive to different strategies of selecting training dataset. Effort was necessary to find the best training dataset for achieving a suitable level of accuracy of spatial prediction. To identify a specific dataset seems to be better than using a great number of variables or a large volume of training data. The efforts made allowed for the accurate acquisition of a mapped area 15.5 times larger than the reference area.
publishDate	2019
dc.date.none.fl_str_mv	2019-04-16
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://www.revistas.usp.br/sa/article/view/156957 10.1590/1678-992x-2017-0300
url	https://www.revistas.usp.br/sa/article/view/156957
identifier_str_mv	10.1590/1678-992x-2017-0300
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	https://www.revistas.usp.br/sa/article/view/156957/152368
dc.rights.driver.fl_str_mv	Copyright (c) 2019 Scientia Agricola info:eu-repo/semantics/openAccess
rights_invalid_str_mv	Copyright (c) 2019 Scientia Agricola
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Universidade de São Paulo. Escola Superior de Agricultura Luiz de Queiroz
publisher.none.fl_str_mv	Universidade de São Paulo. Escola Superior de Agricultura Luiz de Queiroz
dc.source.none.fl_str_mv	Scientia Agricola; v. 76 n. 3 (2019); 243-254 Scientia Agricola; Vol. 76 Núm. 3 (2019); 243-254 Scientia Agricola; Vol. 76 No. 3 (2019); 243-254 1678-992X 0103-9016 reponame:Scientia Agrícola (Online) instname:Universidade de São Paulo (USP) instacron:USP
instname_str	Universidade de São Paulo (USP)
instacron_str	USP
institution	USP
reponame_str	Scientia Agrícola (Online)
collection	Scientia Agrícola (Online)
repository.name.fl_str_mv	Scientia Agrícola (Online) - Universidade de São Paulo (USP)
repository.mail.fl_str_mv	scientia@usp.br\|\|alleoni@usp.br
_version_	1800222793958162432

Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment

Registros relacionados