Using PubChem’s database with data mining and machine learning algorithms for the prediction of EGFR inhibitors: a comparative study
Autor(a) principal: | |
---|---|
Data de Publicação: | 2018 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10362/33784 |
Resumo: | Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business Intelligence |
id |
RCAP_75a7b9d9a737ef53f85eccb25535c1d8 |
---|---|
oai_identifier_str |
oai:run.unl.pt:10362/33784 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Using PubChem’s database with data mining and machine learning algorithms for the prediction of EGFR inhibitors: a comparative studyData miningMachine learningEpidermal growth factor receptorPubChemDissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceData Mining and Machine Learning algorithms and methods have become increasingly important for several industries due to the amount of available data that has grown exponentially in recent years and led to the need of effective ways of gaining insights from that data. In this study, these methods are applied to the prediction of Epidermal Growth Factor Receptor inhibitors using data extracted from PubChem’s database. PubChem is a freely accessible chemical repository that contains information submitted from several different sources, and that comprises three databases, one of which provides information about BioAssays, that is, assays with the purpose of screening numerous compounds for activity on a particular biological target. In this work, the dataset used to train and evaluate the developed models resulted from the information gathered from the assays performed to identify inhibitors of EGFR and the source for the features used to characterize the compounds was PubChem’s own chemical descriptor, the Substructure Fingerprint. The work comprises a literature review on this subject and the implementation of a methodology that tests the performance of different types of classifiers for the problem at hand, namely Naïve Bayes, Decision Tree, Logistic Regression, !-Nearest Neighbors, Support Vector Machine, Multilayer Perceptron, Random Forest, Extremely Randomized Trees, Bagging, Boosting and Voting. Considering both the evaluated quality metrics and the model’s computational burden, the Multilayer Perceptron was considered the best model, although some of the other models had close performances. It was concluded that the used methodology and developed models had good quality, as did PubChem’s Substructure Fingerprint as a descriptor, but that there was still room for improvement that could be achieved with further experimentation on different aspects of the methodology.Castelli, MauroRUNRosa, Liliana Monteiro2018-04-04T13:38:45Z2018-03-202018-03-20T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/33784TID:201893614enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T04:18:32Zoai:run.unl.pt:10362/33784Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:30:04.358141Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Using PubChem’s database with data mining and machine learning algorithms for the prediction of EGFR inhibitors: a comparative study |
title |
Using PubChem’s database with data mining and machine learning algorithms for the prediction of EGFR inhibitors: a comparative study |
spellingShingle |
Using PubChem’s database with data mining and machine learning algorithms for the prediction of EGFR inhibitors: a comparative study Rosa, Liliana Monteiro Data mining Machine learning Epidermal growth factor receptor PubChem |
title_short |
Using PubChem’s database with data mining and machine learning algorithms for the prediction of EGFR inhibitors: a comparative study |
title_full |
Using PubChem’s database with data mining and machine learning algorithms for the prediction of EGFR inhibitors: a comparative study |
title_fullStr |
Using PubChem’s database with data mining and machine learning algorithms for the prediction of EGFR inhibitors: a comparative study |
title_full_unstemmed |
Using PubChem’s database with data mining and machine learning algorithms for the prediction of EGFR inhibitors: a comparative study |
title_sort |
Using PubChem’s database with data mining and machine learning algorithms for the prediction of EGFR inhibitors: a comparative study |
author |
Rosa, Liliana Monteiro |
author_facet |
Rosa, Liliana Monteiro |
author_role |
author |
dc.contributor.none.fl_str_mv |
Castelli, Mauro RUN |
dc.contributor.author.fl_str_mv |
Rosa, Liliana Monteiro |
dc.subject.por.fl_str_mv |
Data mining Machine learning Epidermal growth factor receptor PubChem |
topic |
Data mining Machine learning Epidermal growth factor receptor PubChem |
description |
Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business Intelligence |
publishDate |
2018 |
dc.date.none.fl_str_mv |
2018-04-04T13:38:45Z 2018-03-20 2018-03-20T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10362/33784 TID:201893614 |
url |
http://hdl.handle.net/10362/33784 |
identifier_str_mv |
TID:201893614 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799137925187764224 |