Improving Tree-based Pipeline Optimization Tool with Geometric Semantic Genetic Programming
Autor(a) principal: | |
---|---|
Data de Publicação: | 2022 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10362/145535 |
Resumo: | Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data Science |
id |
RCAP_99a0ad442adf8aff00e9ee7686b40b6d |
---|---|
oai_identifier_str |
oai:run.unl.pt:10362/145535 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Improving Tree-based Pipeline Optimization Tool with Geometric Semantic Genetic ProgrammingAutomated Machine LearningGenetic ProgrammingGeometric Semantic Genetic ProgrammingTree-based Pipeline Optimization ToolRegressionDissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceMachine Learning (ML) is becoming part of our lives, from face recognition to sensors of the latest cars. However, the construction of its pipelines is a time-consuming and expensive process, even for experts that have the knowledge in ML algorithms, due to the several options for each step. To overcome this issue, Automated ML (AutoML) was introduced, automating some steps of this process. One of its recent algorithms is Tree-Based Pipeline Optimization Tool (TPOT), an Evolutionary Algorithm (EA) that automatically designs and optimizes ML pipelines using Genetic Programming (GP). Another recent algorithm is Geometric Semantic Genetic Programming (GSGP), an EA characterized by using the semantics, the vector of outputs of a program on the different training data, and by searching directly in the space of semantics of the program through geometric semantic operators, leading to a unimodal fitness landscape. In this work, a new version of TPOT was created, called TPOT-GSGP, where GSGP is one of the options for model selection. This new algorithm was implemented in Python, only for regression problems and using Negative Mean Absolute Error as measurement error. Five case studies were used to compare the performance of three algorithms: TPOT-GSGP, the original TPOT, and GSGP. Additionally, the statistical significance of the difference on the last generation’s score for each combination of two algorithms was checked with Wilcoxon tests. There was not a single algorithm that outperformed the others in all datasets, sometimes it was TPOT-GSGP and others TPOT, depending on the case study and on the score that was analysed (learning or test). It was concluded that every time GSGP is chosen as root 50% of the times or more, TPOT-GSGP outperformed TPOT on the test set. Therefore, the advantages of this new algorithm can be extraordinary with its development and adjustment in future work.Vanneschi, LeonardoRUNChhotobhai, Helena Hetal2022-11-15T16:43:09Z2022-10-252022-10-25T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/145535TID:203097912enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T05:26:02Zoai:run.unl.pt:10362/145535Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:52:08.310025Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Improving Tree-based Pipeline Optimization Tool with Geometric Semantic Genetic Programming |
title |
Improving Tree-based Pipeline Optimization Tool with Geometric Semantic Genetic Programming |
spellingShingle |
Improving Tree-based Pipeline Optimization Tool with Geometric Semantic Genetic Programming Chhotobhai, Helena Hetal Automated Machine Learning Genetic Programming Geometric Semantic Genetic Programming Tree-based Pipeline Optimization Tool Regression |
title_short |
Improving Tree-based Pipeline Optimization Tool with Geometric Semantic Genetic Programming |
title_full |
Improving Tree-based Pipeline Optimization Tool with Geometric Semantic Genetic Programming |
title_fullStr |
Improving Tree-based Pipeline Optimization Tool with Geometric Semantic Genetic Programming |
title_full_unstemmed |
Improving Tree-based Pipeline Optimization Tool with Geometric Semantic Genetic Programming |
title_sort |
Improving Tree-based Pipeline Optimization Tool with Geometric Semantic Genetic Programming |
author |
Chhotobhai, Helena Hetal |
author_facet |
Chhotobhai, Helena Hetal |
author_role |
author |
dc.contributor.none.fl_str_mv |
Vanneschi, Leonardo RUN |
dc.contributor.author.fl_str_mv |
Chhotobhai, Helena Hetal |
dc.subject.por.fl_str_mv |
Automated Machine Learning Genetic Programming Geometric Semantic Genetic Programming Tree-based Pipeline Optimization Tool Regression |
topic |
Automated Machine Learning Genetic Programming Geometric Semantic Genetic Programming Tree-based Pipeline Optimization Tool Regression |
description |
Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data Science |
publishDate |
2022 |
dc.date.none.fl_str_mv |
2022-11-15T16:43:09Z 2022-10-25 2022-10-25T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10362/145535 TID:203097912 |
url |
http://hdl.handle.net/10362/145535 |
identifier_str_mv |
TID:203097912 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799138112992968704 |