Using PubChem’s database with data mining and machine learning algorithms for the prediction of EGFR inhibitors: a comparative study

Detalhes bibliográficos
Autor(a) principal: Rosa, Liliana Monteiro
Data de Publicação: 2018
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10362/33784
Resumo: Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business Intelligence
id RCAP_75a7b9d9a737ef53f85eccb25535c1d8
oai_identifier_str oai:run.unl.pt:10362/33784
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Using PubChem’s database with data mining and machine learning algorithms for the prediction of EGFR inhibitors: a comparative studyData miningMachine learningEpidermal growth factor receptorPubChemDissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceData Mining and Machine Learning algorithms and methods have become increasingly important for several industries due to the amount of available data that has grown exponentially in recent years and led to the need of effective ways of gaining insights from that data. In this study, these methods are applied to the prediction of Epidermal Growth Factor Receptor inhibitors using data extracted from PubChem’s database. PubChem is a freely accessible chemical repository that contains information submitted from several different sources, and that comprises three databases, one of which provides information about BioAssays, that is, assays with the purpose of screening numerous compounds for activity on a particular biological target. In this work, the dataset used to train and evaluate the developed models resulted from the information gathered from the assays performed to identify inhibitors of EGFR and the source for the features used to characterize the compounds was PubChem’s own chemical descriptor, the Substructure Fingerprint. The work comprises a literature review on this subject and the implementation of a methodology that tests the performance of different types of classifiers for the problem at hand, namely Naïve Bayes, Decision Tree, Logistic Regression, !-Nearest Neighbors, Support Vector Machine, Multilayer Perceptron, Random Forest, Extremely Randomized Trees, Bagging, Boosting and Voting. Considering both the evaluated quality metrics and the model’s computational burden, the Multilayer Perceptron was considered the best model, although some of the other models had close performances. It was concluded that the used methodology and developed models had good quality, as did PubChem’s Substructure Fingerprint as a descriptor, but that there was still room for improvement that could be achieved with further experimentation on different aspects of the methodology.Castelli, MauroRUNRosa, Liliana Monteiro2018-04-04T13:38:45Z2018-03-202018-03-20T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/33784TID:201893614enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T04:18:32Zoai:run.unl.pt:10362/33784Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:30:04.358141Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Using PubChem’s database with data mining and machine learning algorithms for the prediction of EGFR inhibitors: a comparative study
title Using PubChem’s database with data mining and machine learning algorithms for the prediction of EGFR inhibitors: a comparative study
spellingShingle Using PubChem’s database with data mining and machine learning algorithms for the prediction of EGFR inhibitors: a comparative study
Rosa, Liliana Monteiro
Data mining
Machine learning
Epidermal growth factor receptor
PubChem
title_short Using PubChem’s database with data mining and machine learning algorithms for the prediction of EGFR inhibitors: a comparative study
title_full Using PubChem’s database with data mining and machine learning algorithms for the prediction of EGFR inhibitors: a comparative study
title_fullStr Using PubChem’s database with data mining and machine learning algorithms for the prediction of EGFR inhibitors: a comparative study
title_full_unstemmed Using PubChem’s database with data mining and machine learning algorithms for the prediction of EGFR inhibitors: a comparative study
title_sort Using PubChem’s database with data mining and machine learning algorithms for the prediction of EGFR inhibitors: a comparative study
author Rosa, Liliana Monteiro
author_facet Rosa, Liliana Monteiro
author_role author
dc.contributor.none.fl_str_mv Castelli, Mauro
RUN
dc.contributor.author.fl_str_mv Rosa, Liliana Monteiro
dc.subject.por.fl_str_mv Data mining
Machine learning
Epidermal growth factor receptor
PubChem
topic Data mining
Machine learning
Epidermal growth factor receptor
PubChem
description Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business Intelligence
publishDate 2018
dc.date.none.fl_str_mv 2018-04-04T13:38:45Z
2018-03-20
2018-03-20T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/33784
TID:201893614
url http://hdl.handle.net/10362/33784
identifier_str_mv TID:201893614
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799137925187764224