Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment

Machado,Diego Fernandes Terra; Silva,Sérgio Henrique Godinho; Curi,Nilton; Menezes,Michele Duarte de

Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment

Detalhes bibliográficos
Autor(a) principal:	Machado,Diego Fernandes Terra
Data de Publicação:	2019
Outros Autores:	Silva,Sérgio Henrique Godinho, Curi,Nilton, Menezes,Michele Duarte de
Tipo de documento:	Artigo
Idioma:	eng
Título da fonte:	Scientia Agrícola (Online)
Texto Completo:	http://old.scielo.br/scielo.php?script=sci_arttext&pid=S0103-90162019001300243
Resumo:	ABSTRACT: Different uses of soil legacy data such as training dataset as well as the selection of soil environmental covariables could drive the accuracy of machine learning techniques. Thus, this study evaluated the ability of the Random Forest algorithm to predict soil classes from different training datasets and extrapolate such information to a similar area. The following training datasets were extracted from legacy data: a) point data composed of 53 soil samples; b) 30 m buffer around the soil samples, and soil map polygons excluding: c) 20 m; and d) 30 m from the boundaries of polygons. These four datasets were submitted to principal component analysis (PCA) to reduce multidimensionality. Each dataset derived a new one. Different combinations of predictor variables were tested. A total of 52 models were evaluated by means of error of models, prediction uncertainty and external validation for overall accuracy and Kappa index. The best result was obtained by reducing the number of predictors with the PCA along with information from the buffer around the points. Although Random Forest has been considered a robust spatial predictor model, it was clear it is sensitive to different strategies of selecting training dataset. Effort was necessary to find the best training dataset for achieving a suitable level of accuracy of spatial prediction. To identify a specific dataset seems to be better than using a great number of variables or a large volume of training data. The efforts made allowed for the accurate acquisition of a mapped area 15.5 times larger than the reference area.

Metadados do item

id	USP-18_2683c6afbaa839f8364124a5fa95c6f6
oai_identifier_str	oai:scielo:S0103-90162019001300243
network_acronym_str	USP-18
network_name_str	Scientia Agrícola (Online)
repository_id_str
spelling	Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessmentdigital soil mappingsoil surveylegacy dataABSTRACT: Different uses of soil legacy data such as training dataset as well as the selection of soil environmental covariables could drive the accuracy of machine learning techniques. Thus, this study evaluated the ability of the Random Forest algorithm to predict soil classes from different training datasets and extrapolate such information to a similar area. The following training datasets were extracted from legacy data: a) point data composed of 53 soil samples; b) 30 m buffer around the soil samples, and soil map polygons excluding: c) 20 m; and d) 30 m from the boundaries of polygons. These four datasets were submitted to principal component analysis (PCA) to reduce multidimensionality. Each dataset derived a new one. Different combinations of predictor variables were tested. A total of 52 models were evaluated by means of error of models, prediction uncertainty and external validation for overall accuracy and Kappa index. The best result was obtained by reducing the number of predictors with the PCA along with information from the buffer around the points. Although Random Forest has been considered a robust spatial predictor model, it was clear it is sensitive to different strategies of selecting training dataset. Effort was necessary to find the best training dataset for achieving a suitable level of accuracy of spatial prediction. To identify a specific dataset seems to be better than using a great number of variables or a large volume of training data. The efforts made allowed for the accurate acquisition of a mapped area 15.5 times larger than the reference area.Escola Superior de Agricultura "Luiz de Queiroz"2019-05-01info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersiontext/htmlhttp://old.scielo.br/scielo.php?script=sci_arttext&pid=S0103-90162019001300243Scientia Agricola v.76 n.3 2019reponame:Scientia Agrícola (Online)instname:Universidade de São Paulo (USP)instacron:USP10.1590/1678-992x-2017-0300info:eu-repo/semantics/openAccessMachado,Diego Fernandes TerraSilva,Sérgio Henrique GodinhoCuri,NiltonMenezes,Michele Duarte deeng2019-02-28T00:00:00Zoai:scielo:S0103-90162019001300243Revistahttp://revistas.usp.br/sa/indexPUBhttps://old.scielo.br/oai/scielo-oai.phpscientia@usp.br\|\|alleoni@usp.br1678-992X0103-9016opendoar:2019-02-28T00:00Scientia Agrícola (Online) - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv	Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment
title	Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment
spellingShingle	Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment Machado,Diego Fernandes Terra digital soil mapping soil survey legacy data
title_short	Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment
title_full	Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment
title_fullStr	Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment
title_full_unstemmed	Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment
title_sort	Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment
author	Machado,Diego Fernandes Terra
author_facet	Machado,Diego Fernandes Terra Silva,Sérgio Henrique Godinho Curi,Nilton Menezes,Michele Duarte de
author_role	author
author2	Silva,Sérgio Henrique Godinho Curi,Nilton Menezes,Michele Duarte de
author2_role	author author author
dc.contributor.author.fl_str_mv	Machado,Diego Fernandes Terra Silva,Sérgio Henrique Godinho Curi,Nilton Menezes,Michele Duarte de
dc.subject.por.fl_str_mv	digital soil mapping soil survey legacy data
topic	digital soil mapping soil survey legacy data
description	ABSTRACT: Different uses of soil legacy data such as training dataset as well as the selection of soil environmental covariables could drive the accuracy of machine learning techniques. Thus, this study evaluated the ability of the Random Forest algorithm to predict soil classes from different training datasets and extrapolate such information to a similar area. The following training datasets were extracted from legacy data: a) point data composed of 53 soil samples; b) 30 m buffer around the soil samples, and soil map polygons excluding: c) 20 m; and d) 30 m from the boundaries of polygons. These four datasets were submitted to principal component analysis (PCA) to reduce multidimensionality. Each dataset derived a new one. Different combinations of predictor variables were tested. A total of 52 models were evaluated by means of error of models, prediction uncertainty and external validation for overall accuracy and Kappa index. The best result was obtained by reducing the number of predictors with the PCA along with information from the buffer around the points. Although Random Forest has been considered a robust spatial predictor model, it was clear it is sensitive to different strategies of selecting training dataset. Effort was necessary to find the best training dataset for achieving a suitable level of accuracy of spatial prediction. To identify a specific dataset seems to be better than using a great number of variables or a large volume of training data. The efforts made allowed for the accurate acquisition of a mapped area 15.5 times larger than the reference area.
publishDate	2019
dc.date.none.fl_str_mv	2019-05-01
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://old.scielo.br/scielo.php?script=sci_arttext&pid=S0103-90162019001300243
url	http://old.scielo.br/scielo.php?script=sci_arttext&pid=S0103-90162019001300243
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	10.1590/1678-992x-2017-0300
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	text/html
dc.publisher.none.fl_str_mv	Escola Superior de Agricultura "Luiz de Queiroz"
publisher.none.fl_str_mv	Escola Superior de Agricultura "Luiz de Queiroz"
dc.source.none.fl_str_mv	Scientia Agricola v.76 n.3 2019 reponame:Scientia Agrícola (Online) instname:Universidade de São Paulo (USP) instacron:USP
instname_str	Universidade de São Paulo (USP)
instacron_str	USP
institution	USP
reponame_str	Scientia Agrícola (Online)
collection	Scientia Agrícola (Online)
repository.name.fl_str_mv	Scientia Agrícola (Online) - Universidade de São Paulo (USP)
repository.mail.fl_str_mv	scientia@usp.br\|\|alleoni@usp.br
_version_	1748936464834494464

Soil type spatial prediction from Random Forest: different training datasets, transferability, accuracy and uncertainty assessment

Registros relacionados