Improving Tree-based Pipeline Optimization Tool with Geometric Semantic Genetic Programming

Detalhes bibliográficos
Autor(a) principal: Chhotobhai, Helena Hetal
Data de Publicação: 2022
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10362/145535
Resumo: Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data Science
id RCAP_99a0ad442adf8aff00e9ee7686b40b6d
oai_identifier_str oai:run.unl.pt:10362/145535
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Improving Tree-based Pipeline Optimization Tool with Geometric Semantic Genetic ProgrammingAutomated Machine LearningGenetic ProgrammingGeometric Semantic Genetic ProgrammingTree-based Pipeline Optimization ToolRegressionDissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceMachine Learning (ML) is becoming part of our lives, from face recognition to sensors of the latest cars. However, the construction of its pipelines is a time-consuming and expensive process, even for experts that have the knowledge in ML algorithms, due to the several options for each step. To overcome this issue, Automated ML (AutoML) was introduced, automating some steps of this process. One of its recent algorithms is Tree-Based Pipeline Optimization Tool (TPOT), an Evolutionary Algorithm (EA) that automatically designs and optimizes ML pipelines using Genetic Programming (GP). Another recent algorithm is Geometric Semantic Genetic Programming (GSGP), an EA characterized by using the semantics, the vector of outputs of a program on the different training data, and by searching directly in the space of semantics of the program through geometric semantic operators, leading to a unimodal fitness landscape. In this work, a new version of TPOT was created, called TPOT-GSGP, where GSGP is one of the options for model selection. This new algorithm was implemented in Python, only for regression problems and using Negative Mean Absolute Error as measurement error. Five case studies were used to compare the performance of three algorithms: TPOT-GSGP, the original TPOT, and GSGP. Additionally, the statistical significance of the difference on the last generation’s score for each combination of two algorithms was checked with Wilcoxon tests. There was not a single algorithm that outperformed the others in all datasets, sometimes it was TPOT-GSGP and others TPOT, depending on the case study and on the score that was analysed (learning or test). It was concluded that every time GSGP is chosen as root 50% of the times or more, TPOT-GSGP outperformed TPOT on the test set. Therefore, the advantages of this new algorithm can be extraordinary with its development and adjustment in future work.Vanneschi, LeonardoRUNChhotobhai, Helena Hetal2022-11-15T16:43:09Z2022-10-252022-10-25T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/145535TID:203097912enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T05:26:02Zoai:run.unl.pt:10362/145535Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:52:08.310025Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Improving Tree-based Pipeline Optimization Tool with Geometric Semantic Genetic Programming
title Improving Tree-based Pipeline Optimization Tool with Geometric Semantic Genetic Programming
spellingShingle Improving Tree-based Pipeline Optimization Tool with Geometric Semantic Genetic Programming
Chhotobhai, Helena Hetal
Automated Machine Learning
Genetic Programming
Geometric Semantic Genetic Programming
Tree-based Pipeline Optimization Tool
Regression
title_short Improving Tree-based Pipeline Optimization Tool with Geometric Semantic Genetic Programming
title_full Improving Tree-based Pipeline Optimization Tool with Geometric Semantic Genetic Programming
title_fullStr Improving Tree-based Pipeline Optimization Tool with Geometric Semantic Genetic Programming
title_full_unstemmed Improving Tree-based Pipeline Optimization Tool with Geometric Semantic Genetic Programming
title_sort Improving Tree-based Pipeline Optimization Tool with Geometric Semantic Genetic Programming
author Chhotobhai, Helena Hetal
author_facet Chhotobhai, Helena Hetal
author_role author
dc.contributor.none.fl_str_mv Vanneschi, Leonardo
RUN
dc.contributor.author.fl_str_mv Chhotobhai, Helena Hetal
dc.subject.por.fl_str_mv Automated Machine Learning
Genetic Programming
Geometric Semantic Genetic Programming
Tree-based Pipeline Optimization Tool
Regression
topic Automated Machine Learning
Genetic Programming
Geometric Semantic Genetic Programming
Tree-based Pipeline Optimization Tool
Regression
description Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data Science
publishDate 2022
dc.date.none.fl_str_mv 2022-11-15T16:43:09Z
2022-10-25
2022-10-25T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/145535
TID:203097912
url http://hdl.handle.net/10362/145535
identifier_str_mv TID:203097912
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799138112992968704