Soil Classification Resorting to Machine Learning Techniques

Dias, Didier Narciso

Soil Classification Resorting to Machine Learning Techniques

Detalhes bibliográficos
Autor(a) principal:	Dias, Didier Narciso
Data de Publicação:	2019
Tipo de documento:	Dissertação
Idioma:	eng
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	http://hdl.handle.net/10362/125335
Resumo:	Soil classification is the act of resuming the most relevant information about a soil profile into a single class, from which we can infer a large amount of properties without extensive knowledge of the subject. These classes then make the communication of soils, and how they can best be used in areas such as agriculture and forestry, simpler and easier to understand. Unfortunately soil classification is expensive and requires that specialists perform varied experiments, to be able to precisely attribute a class to a soil profile. This master’s thesis focuses on machine learning algorithms for soil classification mainly based on its intrinsic attributes, in the Mexico region. The data set used contains 6 760 soil profiles, the 19 464 horizons that constitute them, as well as physical and chemical properties, such as pH or organic content, belonging to those horizons. Four data modelling methods were tested (i.e., standard depths, n first layers, thickness, and area weighted thickness), as well as different values for a k-Nearest Neighbours imputation. A comparison between state of the art machine learning algorithms was also made, namely Random Forests, Gradient Tree Boosting, Deep Neural Networks and Recurrent Neural Networks. All of our modelling methods provided very similar results, when properly parametrised, reaching Kappa values of 0.504 and an accuracy of 0.554, with the standard depths method providing the most consistent results. The k parameter for the imputation showed very little impact on the variation on the results. Gradient Tree Boosting was the algorithm with the best overall results, closely followed by the Random Forests model. The neuron based methods never achieved a Kappa score over 0.4, therefore providing substantially worse results.

Metadados do item

id	RCAP_2b79730837da0090270eeba67fa3eb72
oai_identifier_str	oai:run.unl.pt:10362/125335
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	Soil Classification Resorting to Machine Learning TechniquesSoil ClassificationSoil PropertiesEnsemble LearningNeural NetworksGradient Tree BoostingRandom ForestsDomínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e InformáticaSoil classification is the act of resuming the most relevant information about a soil profile into a single class, from which we can infer a large amount of properties without extensive knowledge of the subject. These classes then make the communication of soils, and how they can best be used in areas such as agriculture and forestry, simpler and easier to understand. Unfortunately soil classification is expensive and requires that specialists perform varied experiments, to be able to precisely attribute a class to a soil profile. This master’s thesis focuses on machine learning algorithms for soil classification mainly based on its intrinsic attributes, in the Mexico region. The data set used contains 6 760 soil profiles, the 19 464 horizons that constitute them, as well as physical and chemical properties, such as pH or organic content, belonging to those horizons. Four data modelling methods were tested (i.e., standard depths, n first layers, thickness, and area weighted thickness), as well as different values for a k-Nearest Neighbours imputation. A comparison between state of the art machine learning algorithms was also made, namely Random Forests, Gradient Tree Boosting, Deep Neural Networks and Recurrent Neural Networks. All of our modelling methods provided very similar results, when properly parametrised, reaching Kappa values of 0.504 and an accuracy of 0.554, with the standard depths method providing the most consistent results. The k parameter for the imputation showed very little impact on the variation on the results. Gradient Tree Boosting was the algorithm with the best overall results, closely followed by the Random Forests model. The neuron based methods never achieved a Kappa score over 0.4, therefore providing substantially worse results.A classificação de solos é o ato de resumir a informação sobre um perfil do solo em uma única classe, da qual é possivel inferir várias propriedades, mesmo com a ausência de conhecimento sobre a área de estudo. Estas classes fazem a comunicação dos solos e de como estes podem ser usados, em áreas como a agricultura e silvicultura, mais simples de perceber. Infelizmente a classificação de solos é dispendiosa, demorada, e requer especialistas para realizar as experiências necessárias para classificar corretamente o solo em causa. A presente tese de mestrado focou-se na avaliação de algoritmos de aprendizagem automática para o problema de classificação de solos, baseada maioritariamente nos atributos intrínsecos destes, na região do México. Foi utilizada uma base de dados contendo 6 760 perfis de solos, os 19 464 horizontes que os constituem, e as propriedades químicas e físicas, como o pH e a percentagem de barro, pertencentes a esses horizontes. Quatro métodos de modelação de dados foram testados (standard depths, n first layers, thickness, e area weighted thickness), tal como diferentes valores para uma imputação baseada em k-Nearest Neighbours. Também foi realizada uma comparação entre algoritmos de aprendizagem automática, nomeadamente Random Forests, Gradient Tree Boosting, Deep Neural Networks e Recurrent Neural Networks. Todas as modelações de dados providenciaram resultados similares, quando propriamente parametrisados, atingindo valores de Kappa de 0.504 e accuracy de 0.554, sendo que o métdodo standard depths obteve uma performance mais consistente. O parâmetro k, referente ao método de imputação, revelou ter pouco impacto na variação dos resultados. O algoritmo Gradient Tree Boosting foi o que obteve melhores resultados, seguido de perto pelo modelo de Random Forests. Os métodos baseados em neurónios tiveram resultados substancialmente piores, nunca superando um valor de Kappa de 0.4.Pires, JoãoMartins, BrunoRUNDias, Didier Narciso2021-09-29T15:15:59Z2019-112019-11-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/125335enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T05:06:23Zoai:run.unl.pt:10362/125335Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:45:41.715094Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	Soil Classification Resorting to Machine Learning Techniques
title	Soil Classification Resorting to Machine Learning Techniques
spellingShingle	Soil Classification Resorting to Machine Learning Techniques Dias, Didier Narciso Soil Classification Soil Properties Ensemble Learning Neural Networks Gradient Tree Boosting Random Forests Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
title_short	Soil Classification Resorting to Machine Learning Techniques
title_full	Soil Classification Resorting to Machine Learning Techniques
title_fullStr	Soil Classification Resorting to Machine Learning Techniques
title_full_unstemmed	Soil Classification Resorting to Machine Learning Techniques
title_sort	Soil Classification Resorting to Machine Learning Techniques
author	Dias, Didier Narciso
author_facet	Dias, Didier Narciso
author_role	author
dc.contributor.none.fl_str_mv	Pires, João Martins, Bruno RUN
dc.contributor.author.fl_str_mv	Dias, Didier Narciso
dc.subject.por.fl_str_mv	Soil Classification Soil Properties Ensemble Learning Neural Networks Gradient Tree Boosting Random Forests Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
topic	Soil Classification Soil Properties Ensemble Learning Neural Networks Gradient Tree Boosting Random Forests Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
description	Soil classification is the act of resuming the most relevant information about a soil profile into a single class, from which we can infer a large amount of properties without extensive knowledge of the subject. These classes then make the communication of soils, and how they can best be used in areas such as agriculture and forestry, simpler and easier to understand. Unfortunately soil classification is expensive and requires that specialists perform varied experiments, to be able to precisely attribute a class to a soil profile. This master’s thesis focuses on machine learning algorithms for soil classification mainly based on its intrinsic attributes, in the Mexico region. The data set used contains 6 760 soil profiles, the 19 464 horizons that constitute them, as well as physical and chemical properties, such as pH or organic content, belonging to those horizons. Four data modelling methods were tested (i.e., standard depths, n first layers, thickness, and area weighted thickness), as well as different values for a k-Nearest Neighbours imputation. A comparison between state of the art machine learning algorithms was also made, namely Random Forests, Gradient Tree Boosting, Deep Neural Networks and Recurrent Neural Networks. All of our modelling methods provided very similar results, when properly parametrised, reaching Kappa values of 0.504 and an accuracy of 0.554, with the standard depths method providing the most consistent results. The k parameter for the imputation showed very little impact on the variation on the results. Gradient Tree Boosting was the algorithm with the best overall results, closely followed by the Random Forests model. The neuron based methods never achieved a Kappa score over 0.4, therefore providing substantially worse results.
publishDate	2019
dc.date.none.fl_str_mv	2019-11 2019-11-01T00:00:00Z 2021-09-29T15:15:59Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10362/125335
url	http://hdl.handle.net/10362/125335
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799138061282443264

Soil Classification Resorting to Machine Learning Techniques

Registros relacionados