Data mining applied to feature selection methods for aboveground carbon stock modelling

Detalhes bibliográficos
Autor(a) principal: Carvalho, Mônica Canaan
Data de Publicação: 2022
Outros Autores: Gomide, Lucas Rezende, Scolforo, José Roberto Soares, Páscoa, Kalill José Viana da, Araújo, Laís Almeida, Lopes, Isáira Leite e
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Institucional da UFLA
Texto Completo: http://repositorio.ufla.br/jspui/handle/1/56516
Resumo: The objective of this work was to apply the random forest (RF) algorithm to the modelling of the aboveground carbon (AGC) stock of a tropical forest by testing three feature selection procedures – recursive removal and the uniobjective and multiobjective genetic algorithms (GAs). The used database covered 1,007 plots sampled in the Rio Grande watershed, in the state of Minas Gerais state, Brazil, and 114 environmental variables (climatic, edaphic, geographic, terrain, and spectral). The best feature selection strategy – RF with multiobjective GA – reaches the minor root-square error of 17.75 Mg ha-1 with only four spectral variables – normalized difference moisture index, normalized burn ratio 2 correlation texture, treecover, and latent heat flux –, which represents a reduction of 96.5% in the size of the database. Feature selection strategies assist in obtaining a better RF performance, by improving the accuracy and reducing the volume of the data. Although the recursive removal and multiobjective GA showed a similar performance as feature selection strategies, the latter presents the smallest subset of variables, with the highest accuracy. The findings of this study highlight the importance of using near infrared, short wavelengths, and derived vegetation indices for the remote-sense-based estimation of AGC. The MODIS products show a significant relationship with the AGC stock and should be further explored by the scientific community for the modelling of this stock.
id UFLA_57937e50c5a07a01fbc28f6f4944e8ba
oai_identifier_str oai:localhost:1/56516
network_acronym_str UFLA
network_name_str Repositório Institucional da UFLA
repository_id_str
spelling Data mining applied to feature selection methods for aboveground carbon stock modellingMineração de dados aplicada a métodos de seleção de variáveis para a modelagem de estoque de carbono acima do soloForest managementGenetic algorithmRandom forestManejo florestalAlgoritmo genéticoFloresta aleatóriaThe objective of this work was to apply the random forest (RF) algorithm to the modelling of the aboveground carbon (AGC) stock of a tropical forest by testing three feature selection procedures – recursive removal and the uniobjective and multiobjective genetic algorithms (GAs). The used database covered 1,007 plots sampled in the Rio Grande watershed, in the state of Minas Gerais state, Brazil, and 114 environmental variables (climatic, edaphic, geographic, terrain, and spectral). The best feature selection strategy – RF with multiobjective GA – reaches the minor root-square error of 17.75 Mg ha-1 with only four spectral variables – normalized difference moisture index, normalized burn ratio 2 correlation texture, treecover, and latent heat flux –, which represents a reduction of 96.5% in the size of the database. Feature selection strategies assist in obtaining a better RF performance, by improving the accuracy and reducing the volume of the data. Although the recursive removal and multiobjective GA showed a similar performance as feature selection strategies, the latter presents the smallest subset of variables, with the highest accuracy. The findings of this study highlight the importance of using near infrared, short wavelengths, and derived vegetation indices for the remote-sense-based estimation of AGC. The MODIS products show a significant relationship with the AGC stock and should be further explored by the scientific community for the modelling of this stock.O objetivo deste trabalho foi aplicar o algoritmo “random forest” (RF) à modelagem do estoque de carbono acima do solo (CAS) de uma floresta tropical, por meio da testagem de três procedimentos de seleção de variáveis: remoção recursiva e algoritmos genéticos (AGs) uniobjetivo e multiobjetivo. Os dados utilizados abrangeram 1.007 parcelas amostradas na bacia hidrográfica do Rio Grande, no estado de Minas Gerais, Brasil, e 114 variáveis ambientais (climáticas, edáficas, geográficas, de terreno e espectrais). A melhor estratégia de seleção de variáveis – a RF com AG multiobjetivo – chega ao menor erro quadrático de 17,75 Mg ha-1 com apenas quatro variáveis espectrais – índice de umidade por diferença normalizada, textura de correlação do índice de queimada por razão normalizada 2, cobertura arbórea e fluxo de calor latente –, o que representa redução de 96,5% no tamanho do banco de dados. As estratégias de seleção de variáveis ajudam a obter melhor desempenho da RF, ao melhorar a acurácia e reduzir o volume dos dados. Embora a remoção recursiva e o AG multiobjetivo mostrem desempenho semelhante como estratégias de seleção de variáveis, esta último apresenta menor subconjunto de variáveis, com maior precisão. As descobertas deste trabalho destacam a importância do uso de infravermelho próximo, comprimentos de onda curtos e índices de vegetação derivados para a estimativa de CAS baseada em sensoriamento remoto. Os produtos MODIS mostram relação significativa com o estoque de CAS e precisam ser melhor explorados pela comunidade científica para a modelagem deste estoque.Empresa Brasileira de Pesquisa Agropecuária (Embrapa)2023-04-05T18:22:13Z2023-04-05T18:22:13Z2022info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfCARVALHO, M. C. et al. Data mining applied to feature selection methods for aboveground carbon stock modelling. Pesquisa Agropecuária Brasileira, Brasília, DF, v. 57, p. 1-13, 2022. DOI: 10.1590/S1678-3921.pab2022.v57.03015.http://repositorio.ufla.br/jspui/handle/1/56516Pesquisa Agropecuária Brasileira (PAB)reponame:Repositório Institucional da UFLAinstname:Universidade Federal de Lavras (UFLA)instacron:UFLAAttribution 4.0 Internationalhttp://creativecommons.org/licenses/by/4.0/info:eu-repo/semantics/openAccessCarvalho, Mônica CanaanGomide, Lucas RezendeScolforo, José Roberto SoaresPáscoa, Kalill José Viana daAraújo, Laís AlmeidaLopes, Isáira Leite eeng2023-04-05T18:22:13Zoai:localhost:1/56516Repositório InstitucionalPUBhttp://repositorio.ufla.br/oai/requestnivaldo@ufla.br || repositorio.biblioteca@ufla.bropendoar:2023-04-05T18:22:13Repositório Institucional da UFLA - Universidade Federal de Lavras (UFLA)false
dc.title.none.fl_str_mv Data mining applied to feature selection methods for aboveground carbon stock modelling
Mineração de dados aplicada a métodos de seleção de variáveis para a modelagem de estoque de carbono acima do solo
title Data mining applied to feature selection methods for aboveground carbon stock modelling
spellingShingle Data mining applied to feature selection methods for aboveground carbon stock modelling
Carvalho, Mônica Canaan
Forest management
Genetic algorithm
Random forest
Manejo florestal
Algoritmo genético
Floresta aleatória
title_short Data mining applied to feature selection methods for aboveground carbon stock modelling
title_full Data mining applied to feature selection methods for aboveground carbon stock modelling
title_fullStr Data mining applied to feature selection methods for aboveground carbon stock modelling
title_full_unstemmed Data mining applied to feature selection methods for aboveground carbon stock modelling
title_sort Data mining applied to feature selection methods for aboveground carbon stock modelling
author Carvalho, Mônica Canaan
author_facet Carvalho, Mônica Canaan
Gomide, Lucas Rezende
Scolforo, José Roberto Soares
Páscoa, Kalill José Viana da
Araújo, Laís Almeida
Lopes, Isáira Leite e
author_role author
author2 Gomide, Lucas Rezende
Scolforo, José Roberto Soares
Páscoa, Kalill José Viana da
Araújo, Laís Almeida
Lopes, Isáira Leite e
author2_role author
author
author
author
author
dc.contributor.author.fl_str_mv Carvalho, Mônica Canaan
Gomide, Lucas Rezende
Scolforo, José Roberto Soares
Páscoa, Kalill José Viana da
Araújo, Laís Almeida
Lopes, Isáira Leite e
dc.subject.por.fl_str_mv Forest management
Genetic algorithm
Random forest
Manejo florestal
Algoritmo genético
Floresta aleatória
topic Forest management
Genetic algorithm
Random forest
Manejo florestal
Algoritmo genético
Floresta aleatória
description The objective of this work was to apply the random forest (RF) algorithm to the modelling of the aboveground carbon (AGC) stock of a tropical forest by testing three feature selection procedures – recursive removal and the uniobjective and multiobjective genetic algorithms (GAs). The used database covered 1,007 plots sampled in the Rio Grande watershed, in the state of Minas Gerais state, Brazil, and 114 environmental variables (climatic, edaphic, geographic, terrain, and spectral). The best feature selection strategy – RF with multiobjective GA – reaches the minor root-square error of 17.75 Mg ha-1 with only four spectral variables – normalized difference moisture index, normalized burn ratio 2 correlation texture, treecover, and latent heat flux –, which represents a reduction of 96.5% in the size of the database. Feature selection strategies assist in obtaining a better RF performance, by improving the accuracy and reducing the volume of the data. Although the recursive removal and multiobjective GA showed a similar performance as feature selection strategies, the latter presents the smallest subset of variables, with the highest accuracy. The findings of this study highlight the importance of using near infrared, short wavelengths, and derived vegetation indices for the remote-sense-based estimation of AGC. The MODIS products show a significant relationship with the AGC stock and should be further explored by the scientific community for the modelling of this stock.
publishDate 2022
dc.date.none.fl_str_mv 2022
2023-04-05T18:22:13Z
2023-04-05T18:22:13Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv CARVALHO, M. C. et al. Data mining applied to feature selection methods for aboveground carbon stock modelling. Pesquisa Agropecuária Brasileira, Brasília, DF, v. 57, p. 1-13, 2022. DOI: 10.1590/S1678-3921.pab2022.v57.03015.
http://repositorio.ufla.br/jspui/handle/1/56516
identifier_str_mv CARVALHO, M. C. et al. Data mining applied to feature selection methods for aboveground carbon stock modelling. Pesquisa Agropecuária Brasileira, Brasília, DF, v. 57, p. 1-13, 2022. DOI: 10.1590/S1678-3921.pab2022.v57.03015.
url http://repositorio.ufla.br/jspui/handle/1/56516
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv Attribution 4.0 International
http://creativecommons.org/licenses/by/4.0/
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Attribution 4.0 International
http://creativecommons.org/licenses/by/4.0/
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Empresa Brasileira de Pesquisa Agropecuária (Embrapa)
publisher.none.fl_str_mv Empresa Brasileira de Pesquisa Agropecuária (Embrapa)
dc.source.none.fl_str_mv Pesquisa Agropecuária Brasileira (PAB)
reponame:Repositório Institucional da UFLA
instname:Universidade Federal de Lavras (UFLA)
instacron:UFLA
instname_str Universidade Federal de Lavras (UFLA)
instacron_str UFLA
institution UFLA
reponame_str Repositório Institucional da UFLA
collection Repositório Institucional da UFLA
repository.name.fl_str_mv Repositório Institucional da UFLA - Universidade Federal de Lavras (UFLA)
repository.mail.fl_str_mv nivaldo@ufla.br || repositorio.biblioteca@ufla.br
_version_ 1784550205576708096