Soil Classification Resorting to Machine Learning Techniques
Autor(a) principal: | |
---|---|
Data de Publicação: | 2019 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10362/125335 |
Resumo: | Soil classification is the act of resuming the most relevant information about a soil profile into a single class, from which we can infer a large amount of properties without extensive knowledge of the subject. These classes then make the communication of soils, and how they can best be used in areas such as agriculture and forestry, simpler and easier to understand. Unfortunately soil classification is expensive and requires that specialists perform varied experiments, to be able to precisely attribute a class to a soil profile. This master’s thesis focuses on machine learning algorithms for soil classification mainly based on its intrinsic attributes, in the Mexico region. The data set used contains 6 760 soil profiles, the 19 464 horizons that constitute them, as well as physical and chemical properties, such as pH or organic content, belonging to those horizons. Four data modelling methods were tested (i.e., standard depths, n first layers, thickness, and area weighted thickness), as well as different values for a k-Nearest Neighbours imputation. A comparison between state of the art machine learning algorithms was also made, namely Random Forests, Gradient Tree Boosting, Deep Neural Networks and Recurrent Neural Networks. All of our modelling methods provided very similar results, when properly parametrised, reaching Kappa values of 0.504 and an accuracy of 0.554, with the standard depths method providing the most consistent results. The k parameter for the imputation showed very little impact on the variation on the results. Gradient Tree Boosting was the algorithm with the best overall results, closely followed by the Random Forests model. The neuron based methods never achieved a Kappa score over 0.4, therefore providing substantially worse results. |
id |
RCAP_2b79730837da0090270eeba67fa3eb72 |
---|---|
oai_identifier_str |
oai:run.unl.pt:10362/125335 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Soil Classification Resorting to Machine Learning TechniquesSoil ClassificationSoil PropertiesEnsemble LearningNeural NetworksGradient Tree BoostingRandom ForestsDomínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e InformáticaSoil classification is the act of resuming the most relevant information about a soil profile into a single class, from which we can infer a large amount of properties without extensive knowledge of the subject. These classes then make the communication of soils, and how they can best be used in areas such as agriculture and forestry, simpler and easier to understand. Unfortunately soil classification is expensive and requires that specialists perform varied experiments, to be able to precisely attribute a class to a soil profile. This master’s thesis focuses on machine learning algorithms for soil classification mainly based on its intrinsic attributes, in the Mexico region. The data set used contains 6 760 soil profiles, the 19 464 horizons that constitute them, as well as physical and chemical properties, such as pH or organic content, belonging to those horizons. Four data modelling methods were tested (i.e., standard depths, n first layers, thickness, and area weighted thickness), as well as different values for a k-Nearest Neighbours imputation. A comparison between state of the art machine learning algorithms was also made, namely Random Forests, Gradient Tree Boosting, Deep Neural Networks and Recurrent Neural Networks. All of our modelling methods provided very similar results, when properly parametrised, reaching Kappa values of 0.504 and an accuracy of 0.554, with the standard depths method providing the most consistent results. The k parameter for the imputation showed very little impact on the variation on the results. Gradient Tree Boosting was the algorithm with the best overall results, closely followed by the Random Forests model. The neuron based methods never achieved a Kappa score over 0.4, therefore providing substantially worse results.A classificação de solos é o ato de resumir a informação sobre um perfil do solo em uma única classe, da qual é possivel inferir várias propriedades, mesmo com a ausência de conhecimento sobre a área de estudo. Estas classes fazem a comunicação dos solos e de como estes podem ser usados, em áreas como a agricultura e silvicultura, mais simples de perceber. Infelizmente a classificação de solos é dispendiosa, demorada, e requer especialistas para realizar as experiências necessárias para classificar corretamente o solo em causa. A presente tese de mestrado focou-se na avaliação de algoritmos de aprendizagem automática para o problema de classificação de solos, baseada maioritariamente nos atributos intrínsecos destes, na região do México. Foi utilizada uma base de dados contendo 6 760 perfis de solos, os 19 464 horizontes que os constituem, e as propriedades químicas e físicas, como o pH e a percentagem de barro, pertencentes a esses horizontes. Quatro métodos de modelação de dados foram testados (standard depths, n first layers, thickness, e area weighted thickness), tal como diferentes valores para uma imputação baseada em k-Nearest Neighbours. Também foi realizada uma comparação entre algoritmos de aprendizagem automática, nomeadamente Random Forests, Gradient Tree Boosting, Deep Neural Networks e Recurrent Neural Networks. Todas as modelações de dados providenciaram resultados similares, quando propriamente parametrisados, atingindo valores de Kappa de 0.504 e accuracy de 0.554, sendo que o métdodo standard depths obteve uma performance mais consistente. O parâmetro k, referente ao método de imputação, revelou ter pouco impacto na variação dos resultados. O algoritmo Gradient Tree Boosting foi o que obteve melhores resultados, seguido de perto pelo modelo de Random Forests. Os métodos baseados em neurónios tiveram resultados substancialmente piores, nunca superando um valor de Kappa de 0.4.Pires, JoãoMartins, BrunoRUNDias, Didier Narciso2021-09-29T15:15:59Z2019-112019-11-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/125335enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T05:06:23Zoai:run.unl.pt:10362/125335Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:45:41.715094Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Soil Classification Resorting to Machine Learning Techniques |
title |
Soil Classification Resorting to Machine Learning Techniques |
spellingShingle |
Soil Classification Resorting to Machine Learning Techniques Dias, Didier Narciso Soil Classification Soil Properties Ensemble Learning Neural Networks Gradient Tree Boosting Random Forests Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática |
title_short |
Soil Classification Resorting to Machine Learning Techniques |
title_full |
Soil Classification Resorting to Machine Learning Techniques |
title_fullStr |
Soil Classification Resorting to Machine Learning Techniques |
title_full_unstemmed |
Soil Classification Resorting to Machine Learning Techniques |
title_sort |
Soil Classification Resorting to Machine Learning Techniques |
author |
Dias, Didier Narciso |
author_facet |
Dias, Didier Narciso |
author_role |
author |
dc.contributor.none.fl_str_mv |
Pires, João Martins, Bruno RUN |
dc.contributor.author.fl_str_mv |
Dias, Didier Narciso |
dc.subject.por.fl_str_mv |
Soil Classification Soil Properties Ensemble Learning Neural Networks Gradient Tree Boosting Random Forests Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática |
topic |
Soil Classification Soil Properties Ensemble Learning Neural Networks Gradient Tree Boosting Random Forests Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática |
description |
Soil classification is the act of resuming the most relevant information about a soil profile into a single class, from which we can infer a large amount of properties without extensive knowledge of the subject. These classes then make the communication of soils, and how they can best be used in areas such as agriculture and forestry, simpler and easier to understand. Unfortunately soil classification is expensive and requires that specialists perform varied experiments, to be able to precisely attribute a class to a soil profile. This master’s thesis focuses on machine learning algorithms for soil classification mainly based on its intrinsic attributes, in the Mexico region. The data set used contains 6 760 soil profiles, the 19 464 horizons that constitute them, as well as physical and chemical properties, such as pH or organic content, belonging to those horizons. Four data modelling methods were tested (i.e., standard depths, n first layers, thickness, and area weighted thickness), as well as different values for a k-Nearest Neighbours imputation. A comparison between state of the art machine learning algorithms was also made, namely Random Forests, Gradient Tree Boosting, Deep Neural Networks and Recurrent Neural Networks. All of our modelling methods provided very similar results, when properly parametrised, reaching Kappa values of 0.504 and an accuracy of 0.554, with the standard depths method providing the most consistent results. The k parameter for the imputation showed very little impact on the variation on the results. Gradient Tree Boosting was the algorithm with the best overall results, closely followed by the Random Forests model. The neuron based methods never achieved a Kappa score over 0.4, therefore providing substantially worse results. |
publishDate |
2019 |
dc.date.none.fl_str_mv |
2019-11 2019-11-01T00:00:00Z 2021-09-29T15:15:59Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10362/125335 |
url |
http://hdl.handle.net/10362/125335 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799138061282443264 |