Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment

Detalhes bibliográficos
Autor(a) principal: Machado,Diego Fernandes Terra
Data de Publicação: 2019
Outros Autores: Silva,Sérgio Henrique Godinho, Curi,Nilton, Menezes,Michele Duarte de
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Scientia Agrícola (Online)
Texto Completo: http://old.scielo.br/scielo.php?script=sci_arttext&pid=S0103-90162019001300243
Resumo: ABSTRACT: Different uses of soil legacy data such as training dataset as well as the selection of soil environmental covariables could drive the accuracy of machine learning techniques. Thus, this study evaluated the ability of the Random Forest algorithm to predict soil classes from different training datasets and extrapolate such information to a similar area. The following training datasets were extracted from legacy data: a) point data composed of 53 soil samples; b) 30 m buffer around the soil samples, and soil map polygons excluding: c) 20 m; and d) 30 m from the boundaries of polygons. These four datasets were submitted to principal component analysis (PCA) to reduce multidimensionality. Each dataset derived a new one. Different combinations of predictor variables were tested. A total of 52 models were evaluated by means of error of models, prediction uncertainty and external validation for overall accuracy and Kappa index. The best result was obtained by reducing the number of predictors with the PCA along with information from the buffer around the points. Although Random Forest has been considered a robust spatial predictor model, it was clear it is sensitive to different strategies of selecting training dataset. Effort was necessary to find the best training dataset for achieving a suitable level of accuracy of spatial prediction. To identify a specific dataset seems to be better than using a great number of variables or a large volume of training data. The efforts made allowed for the accurate acquisition of a mapped area 15.5 times larger than the reference area.
id USP-18_2683c6afbaa839f8364124a5fa95c6f6
oai_identifier_str oai:scielo:S0103-90162019001300243
network_acronym_str USP-18
network_name_str Scientia Agrícola (Online)
repository_id_str
spelling Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessmentdigital soil mappingsoil surveylegacy dataABSTRACT: Different uses of soil legacy data such as training dataset as well as the selection of soil environmental covariables could drive the accuracy of machine learning techniques. Thus, this study evaluated the ability of the Random Forest algorithm to predict soil classes from different training datasets and extrapolate such information to a similar area. The following training datasets were extracted from legacy data: a) point data composed of 53 soil samples; b) 30 m buffer around the soil samples, and soil map polygons excluding: c) 20 m; and d) 30 m from the boundaries of polygons. These four datasets were submitted to principal component analysis (PCA) to reduce multidimensionality. Each dataset derived a new one. Different combinations of predictor variables were tested. A total of 52 models were evaluated by means of error of models, prediction uncertainty and external validation for overall accuracy and Kappa index. The best result was obtained by reducing the number of predictors with the PCA along with information from the buffer around the points. Although Random Forest has been considered a robust spatial predictor model, it was clear it is sensitive to different strategies of selecting training dataset. Effort was necessary to find the best training dataset for achieving a suitable level of accuracy of spatial prediction. To identify a specific dataset seems to be better than using a great number of variables or a large volume of training data. The efforts made allowed for the accurate acquisition of a mapped area 15.5 times larger than the reference area.Escola Superior de Agricultura "Luiz de Queiroz"2019-05-01info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersiontext/htmlhttp://old.scielo.br/scielo.php?script=sci_arttext&pid=S0103-90162019001300243Scientia Agricola v.76 n.3 2019reponame:Scientia Agrícola (Online)instname:Universidade de São Paulo (USP)instacron:USP10.1590/1678-992x-2017-0300info:eu-repo/semantics/openAccessMachado,Diego Fernandes TerraSilva,Sérgio Henrique GodinhoCuri,NiltonMenezes,Michele Duarte deeng2019-02-28T00:00:00Zoai:scielo:S0103-90162019001300243Revistahttp://revistas.usp.br/sa/indexPUBhttps://old.scielo.br/oai/scielo-oai.phpscientia@usp.br||alleoni@usp.br1678-992X0103-9016opendoar:2019-02-28T00:00Scientia Agrícola (Online) - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment
title Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment
spellingShingle Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment
Machado,Diego Fernandes Terra
digital soil mapping
soil survey
legacy data
title_short Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment
title_full Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment
title_fullStr Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment
title_full_unstemmed Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment
title_sort Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment
author Machado,Diego Fernandes Terra
author_facet Machado,Diego Fernandes Terra
Silva,Sérgio Henrique Godinho
Curi,Nilton
Menezes,Michele Duarte de
author_role author
author2 Silva,Sérgio Henrique Godinho
Curi,Nilton
Menezes,Michele Duarte de
author2_role author
author
author
dc.contributor.author.fl_str_mv Machado,Diego Fernandes Terra
Silva,Sérgio Henrique Godinho
Curi,Nilton
Menezes,Michele Duarte de
dc.subject.por.fl_str_mv digital soil mapping
soil survey
legacy data
topic digital soil mapping
soil survey
legacy data
description ABSTRACT: Different uses of soil legacy data such as training dataset as well as the selection of soil environmental covariables could drive the accuracy of machine learning techniques. Thus, this study evaluated the ability of the Random Forest algorithm to predict soil classes from different training datasets and extrapolate such information to a similar area. The following training datasets were extracted from legacy data: a) point data composed of 53 soil samples; b) 30 m buffer around the soil samples, and soil map polygons excluding: c) 20 m; and d) 30 m from the boundaries of polygons. These four datasets were submitted to principal component analysis (PCA) to reduce multidimensionality. Each dataset derived a new one. Different combinations of predictor variables were tested. A total of 52 models were evaluated by means of error of models, prediction uncertainty and external validation for overall accuracy and Kappa index. The best result was obtained by reducing the number of predictors with the PCA along with information from the buffer around the points. Although Random Forest has been considered a robust spatial predictor model, it was clear it is sensitive to different strategies of selecting training dataset. Effort was necessary to find the best training dataset for achieving a suitable level of accuracy of spatial prediction. To identify a specific dataset seems to be better than using a great number of variables or a large volume of training data. The efforts made allowed for the accurate acquisition of a mapped area 15.5 times larger than the reference area.
publishDate 2019
dc.date.none.fl_str_mv 2019-05-01
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://old.scielo.br/scielo.php?script=sci_arttext&pid=S0103-90162019001300243
url http://old.scielo.br/scielo.php?script=sci_arttext&pid=S0103-90162019001300243
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 10.1590/1678-992x-2017-0300
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv text/html
dc.publisher.none.fl_str_mv Escola Superior de Agricultura "Luiz de Queiroz"
publisher.none.fl_str_mv Escola Superior de Agricultura "Luiz de Queiroz"
dc.source.none.fl_str_mv Scientia Agricola v.76 n.3 2019
reponame:Scientia Agrícola (Online)
instname:Universidade de São Paulo (USP)
instacron:USP
instname_str Universidade de São Paulo (USP)
instacron_str USP
institution USP
reponame_str Scientia Agrícola (Online)
collection Scientia Agrícola (Online)
repository.name.fl_str_mv Scientia Agrícola (Online) - Universidade de São Paulo (USP)
repository.mail.fl_str_mv scientia@usp.br||alleoni@usp.br
_version_ 1748936464834494464