Sample design effects on soil unit prediction with machine: randomness, uncertainty, and majority map.

Detalhes bibliográficos
Autor(a) principal: CARVALHO JUNIOR, W. de
Data de Publicação: 2020
Outros Autores: PEREIRA, N. R., FERNANDES FILHO, E. I., CALDERANO FILHO, B., PINHEIRO, H. S. K., CHAGAS, C. da S., BHERING, S. B., PEREIRA, V. R., LAWALL, S.
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice)
Texto Completo: http://www.alice.cnptia.embrapa.br/alice/handle/doc/1124458
https://doi.org/10.36783/18069657rbcs2019
Resumo: Notwithstanding the importance of soil surveys, advances in digital soil mapping have mainly focused on mapping soil attributes or properties rather than developing digital maps of soil units or soil classes. The purpose of this research was to develop digital soil unit maps based on primary soil data collection in areas without previously collected soil information. The covariate variability, the random effect across the data subset and the map outputs were the focuses of this study. We used five datasets with four models (Random Forest - RF, Gradient Boosted Machine - GBM, C5.0, and multinomial log-linear model - MLR). The covariates were grouped into five datasets, where four were grouped by Region Of Interest per Class (ROIC) and one was not grouped by ROIC. To evaluate the random effect to split the dataset, we ran each model 50 times and observed the overall accuracy (OA) and kappa index, and uncertainty, majority and variety maps. The OA of Dataset01 to 04 was lower than to Dataset05 accuracy. However, map outputs of RF and GBM for Dataset01 and Dataset05 had the same majority prediction. It seems that RF and GBM produce consistent results in map outputs according to this methodology and pedologist expertise. To evaluate the uncertainty and the consistency of soil unit prediction, we used the majority maps process. Random Forest, similar to GBM, presented the best results. The increase in the number of covariates was not a guarantee of improvement in the OA or in the quality of the map output. Geographic position and distance raster did not improve the map output according to expert evaluation. Because the variance between the ROICs, when the training and validation datasets were split based on it, the subsets are quite different in relation to the covariates, and this is the reason for the worse results of this model, comparing with the Dataset05. On the other hand, when considering one complete dataset not based on ROICs, the variance of training and validation subsets is lower and produced more accurate parameters of quality.
id EMBR_2d5a92965c056a2d52c52e304feaddd5
oai_identifier_str oai:www.alice.cnptia.embrapa.br:doc/1124458
network_acronym_str EMBR
network_name_str Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice)
repository_id_str 2154
spelling Sample design effects on soil unit prediction with machine: randomness, uncertainty, and majority map.Mapeamento digital de solosTree learners modelsHillslope areasRandom forestMapaSoloSoil mapNotwithstanding the importance of soil surveys, advances in digital soil mapping have mainly focused on mapping soil attributes or properties rather than developing digital maps of soil units or soil classes. The purpose of this research was to develop digital soil unit maps based on primary soil data collection in areas without previously collected soil information. The covariate variability, the random effect across the data subset and the map outputs were the focuses of this study. We used five datasets with four models (Random Forest - RF, Gradient Boosted Machine - GBM, C5.0, and multinomial log-linear model - MLR). The covariates were grouped into five datasets, where four were grouped by Region Of Interest per Class (ROIC) and one was not grouped by ROIC. To evaluate the random effect to split the dataset, we ran each model 50 times and observed the overall accuracy (OA) and kappa index, and uncertainty, majority and variety maps. The OA of Dataset01 to 04 was lower than to Dataset05 accuracy. However, map outputs of RF and GBM for Dataset01 and Dataset05 had the same majority prediction. It seems that RF and GBM produce consistent results in map outputs according to this methodology and pedologist expertise. To evaluate the uncertainty and the consistency of soil unit prediction, we used the majority maps process. Random Forest, similar to GBM, presented the best results. The increase in the number of covariates was not a guarantee of improvement in the OA or in the quality of the map output. Geographic position and distance raster did not improve the map output according to expert evaluation. Because the variance between the ROICs, when the training and validation datasets were split based on it, the subsets are quite different in relation to the covariates, and this is the reason for the worse results of this model, comparing with the Dataset05. On the other hand, when considering one complete dataset not based on ROICs, the variance of training and validation subsets is lower and produced more accurate parameters of quality.WALDIR DE CARVALHO JUNIOR, CNPS; NILSON RENDEIRO PEREIRA, CNPS; ELPIDIO INACIO FERNANDES FILHO, UFV; BRAZ CALDERANO FILHO, CNPS; HELENA SARAIVA KOENOW PINHEIRO, UFRRJ; CESAR DA SILVA CHAGAS, CNPS; SILVIO BARGE BHERING, CNPS; VINICIUS RENDEIRO PEREIRA, UFRRJ; SARA LAWALL, UFRRJ.CARVALHO JUNIOR, W. dePEREIRA, N. R.FERNANDES FILHO, E. I.CALDERANO FILHO, B.PINHEIRO, H. S. K.CHAGAS, C. da S.BHERING, S. B.PEREIRA, V. R.LAWALL, S.2020-08-21T04:11:41Z2020-08-21T04:11:41Z2020-08-202020info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleRevista Brasileira de Ciência do Solo, v. 44, e0190120, 2020.http://www.alice.cnptia.embrapa.br/alice/handle/doc/1124458https://doi.org/10.36783/18069657rbcs2019enginfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice)instname:Empresa Brasileira de Pesquisa Agropecuária (Embrapa)instacron:EMBRAPA2020-08-21T04:11:49Zoai:www.alice.cnptia.embrapa.br:doc/1124458Repositório InstitucionalPUBhttps://www.alice.cnptia.embrapa.br/oai/requestopendoar:21542020-08-21T04:11:49falseRepositório InstitucionalPUBhttps://www.alice.cnptia.embrapa.br/oai/requestcg-riaa@embrapa.bropendoar:21542020-08-21T04:11:49Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice) - Empresa Brasileira de Pesquisa Agropecuária (Embrapa)false
dc.title.none.fl_str_mv Sample design effects on soil unit prediction with machine: randomness, uncertainty, and majority map.
title Sample design effects on soil unit prediction with machine: randomness, uncertainty, and majority map.
spellingShingle Sample design effects on soil unit prediction with machine: randomness, uncertainty, and majority map.
CARVALHO JUNIOR, W. de
Mapeamento digital de solos
Tree learners models
Hillslope areas
Random forest
Mapa
Solo
Soil map
title_short Sample design effects on soil unit prediction with machine: randomness, uncertainty, and majority map.
title_full Sample design effects on soil unit prediction with machine: randomness, uncertainty, and majority map.
title_fullStr Sample design effects on soil unit prediction with machine: randomness, uncertainty, and majority map.
title_full_unstemmed Sample design effects on soil unit prediction with machine: randomness, uncertainty, and majority map.
title_sort Sample design effects on soil unit prediction with machine: randomness, uncertainty, and majority map.
author CARVALHO JUNIOR, W. de
author_facet CARVALHO JUNIOR, W. de
PEREIRA, N. R.
FERNANDES FILHO, E. I.
CALDERANO FILHO, B.
PINHEIRO, H. S. K.
CHAGAS, C. da S.
BHERING, S. B.
PEREIRA, V. R.
LAWALL, S.
author_role author
author2 PEREIRA, N. R.
FERNANDES FILHO, E. I.
CALDERANO FILHO, B.
PINHEIRO, H. S. K.
CHAGAS, C. da S.
BHERING, S. B.
PEREIRA, V. R.
LAWALL, S.
author2_role author
author
author
author
author
author
author
author
dc.contributor.none.fl_str_mv WALDIR DE CARVALHO JUNIOR, CNPS; NILSON RENDEIRO PEREIRA, CNPS; ELPIDIO INACIO FERNANDES FILHO, UFV; BRAZ CALDERANO FILHO, CNPS; HELENA SARAIVA KOENOW PINHEIRO, UFRRJ; CESAR DA SILVA CHAGAS, CNPS; SILVIO BARGE BHERING, CNPS; VINICIUS RENDEIRO PEREIRA, UFRRJ; SARA LAWALL, UFRRJ.
dc.contributor.author.fl_str_mv CARVALHO JUNIOR, W. de
PEREIRA, N. R.
FERNANDES FILHO, E. I.
CALDERANO FILHO, B.
PINHEIRO, H. S. K.
CHAGAS, C. da S.
BHERING, S. B.
PEREIRA, V. R.
LAWALL, S.
dc.subject.por.fl_str_mv Mapeamento digital de solos
Tree learners models
Hillslope areas
Random forest
Mapa
Solo
Soil map
topic Mapeamento digital de solos
Tree learners models
Hillslope areas
Random forest
Mapa
Solo
Soil map
description Notwithstanding the importance of soil surveys, advances in digital soil mapping have mainly focused on mapping soil attributes or properties rather than developing digital maps of soil units or soil classes. The purpose of this research was to develop digital soil unit maps based on primary soil data collection in areas without previously collected soil information. The covariate variability, the random effect across the data subset and the map outputs were the focuses of this study. We used five datasets with four models (Random Forest - RF, Gradient Boosted Machine - GBM, C5.0, and multinomial log-linear model - MLR). The covariates were grouped into five datasets, where four were grouped by Region Of Interest per Class (ROIC) and one was not grouped by ROIC. To evaluate the random effect to split the dataset, we ran each model 50 times and observed the overall accuracy (OA) and kappa index, and uncertainty, majority and variety maps. The OA of Dataset01 to 04 was lower than to Dataset05 accuracy. However, map outputs of RF and GBM for Dataset01 and Dataset05 had the same majority prediction. It seems that RF and GBM produce consistent results in map outputs according to this methodology and pedologist expertise. To evaluate the uncertainty and the consistency of soil unit prediction, we used the majority maps process. Random Forest, similar to GBM, presented the best results. The increase in the number of covariates was not a guarantee of improvement in the OA or in the quality of the map output. Geographic position and distance raster did not improve the map output according to expert evaluation. Because the variance between the ROICs, when the training and validation datasets were split based on it, the subsets are quite different in relation to the covariates, and this is the reason for the worse results of this model, comparing with the Dataset05. On the other hand, when considering one complete dataset not based on ROICs, the variance of training and validation subsets is lower and produced more accurate parameters of quality.
publishDate 2020
dc.date.none.fl_str_mv 2020-08-21T04:11:41Z
2020-08-21T04:11:41Z
2020-08-20
2020
dc.type.driver.fl_str_mv info:eu-repo/semantics/publishedVersion
info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv Revista Brasileira de Ciência do Solo, v. 44, e0190120, 2020.
http://www.alice.cnptia.embrapa.br/alice/handle/doc/1124458
https://doi.org/10.36783/18069657rbcs2019
identifier_str_mv Revista Brasileira de Ciência do Solo, v. 44, e0190120, 2020.
url http://www.alice.cnptia.embrapa.br/alice/handle/doc/1124458
https://doi.org/10.36783/18069657rbcs2019
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.source.none.fl_str_mv reponame:Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice)
instname:Empresa Brasileira de Pesquisa Agropecuária (Embrapa)
instacron:EMBRAPA
instname_str Empresa Brasileira de Pesquisa Agropecuária (Embrapa)
instacron_str EMBRAPA
institution EMBRAPA
reponame_str Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice)
collection Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice)
repository.name.fl_str_mv Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice) - Empresa Brasileira de Pesquisa Agropecuária (Embrapa)
repository.mail.fl_str_mv cg-riaa@embrapa.br
_version_ 1794503494856605696