Structural and semantic similarity metrics for chemical compound classification
Autor(a) principal: | |
---|---|
Data de Publicação: | 2010 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10451/13866 |
Resumo: | Over the last few decades, there has been an increasing number of attempts at creating systems capable of comparing and classifying chemical compounds based on their structure and/or physicochemical properties. While the rate of success of these approaches has been increasing, particularly with the introduction of new and ever more sophisticated methods of machine learning, there is still room for improvement. One of the problems of these methods is that they fail to consider that similar molecules may have di erent roles in nature, or, to a lesser extend, that disparate molecules may have similar roles. This thesis proposes the exploitation of the semantic properties of chemical compounds, as described in the ChEBI ontology, to create an e cient system able to automatically deal with the binary classi cation of chemical compounds. To that e ect, I developed Chym (Chemical Hybrid Metric) as a tool that integrates structural and semantic information in a unique hybrid metric. The work here presented shows substantial evidence supporting the e ectiveness of Chym, since it has outperformed all the models with which it was compared. Particularly, it achieved accuracy values of 90.9%, 87.7% and 84.2% when solving three classi cation problems which, previously, had only been solved with accuracy values of 81.5%, 80.6% and 82.8% respectively. Other results show that the tool is appropriate to use even if the problem at hand is not well represented in the ChEBI ontology. Thus, Chym shows that considering the semantic properties of a compound helps solving classi cation problems. Therefore, Chym can be used in projects that require the classi cation and/or the comparison of chemical compounds, such as the study of the evolution of metabolic pathways, drug discovery or in preliminary toxicity analysis. |
id |
RCAP_6a5ae3c7b87f7b642d161d29a5cc8bfa |
---|---|
oai_identifier_str |
oai:repositorio.ul.pt:10451/13866 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Structural and semantic similarity metrics for chemical compound classificationChemical compound similarityMachine learningOntologiesSemantic similarityOver the last few decades, there has been an increasing number of attempts at creating systems capable of comparing and classifying chemical compounds based on their structure and/or physicochemical properties. While the rate of success of these approaches has been increasing, particularly with the introduction of new and ever more sophisticated methods of machine learning, there is still room for improvement. One of the problems of these methods is that they fail to consider that similar molecules may have di erent roles in nature, or, to a lesser extend, that disparate molecules may have similar roles. This thesis proposes the exploitation of the semantic properties of chemical compounds, as described in the ChEBI ontology, to create an e cient system able to automatically deal with the binary classi cation of chemical compounds. To that e ect, I developed Chym (Chemical Hybrid Metric) as a tool that integrates structural and semantic information in a unique hybrid metric. The work here presented shows substantial evidence supporting the e ectiveness of Chym, since it has outperformed all the models with which it was compared. Particularly, it achieved accuracy values of 90.9%, 87.7% and 84.2% when solving three classi cation problems which, previously, had only been solved with accuracy values of 81.5%, 80.6% and 82.8% respectively. Other results show that the tool is appropriate to use even if the problem at hand is not well represented in the ChEBI ontology. Thus, Chym shows that considering the semantic properties of a compound helps solving classi cation problems. Therefore, Chym can be used in projects that require the classi cation and/or the comparison of chemical compounds, such as the study of the evolution of metabolic pathways, drug discovery or in preliminary toxicity analysis.Couto, Francisco MRepositório da Universidade de LisboaFerreira, João D2010-07-22T11:14:42Z20102010-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10451/13866enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-08T15:59:17Zoai:repositorio.ul.pt:10451/13866Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T21:35:47.003020Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Structural and semantic similarity metrics for chemical compound classification |
title |
Structural and semantic similarity metrics for chemical compound classification |
spellingShingle |
Structural and semantic similarity metrics for chemical compound classification Ferreira, João D Chemical compound similarity Machine learning Ontologies Semantic similarity |
title_short |
Structural and semantic similarity metrics for chemical compound classification |
title_full |
Structural and semantic similarity metrics for chemical compound classification |
title_fullStr |
Structural and semantic similarity metrics for chemical compound classification |
title_full_unstemmed |
Structural and semantic similarity metrics for chemical compound classification |
title_sort |
Structural and semantic similarity metrics for chemical compound classification |
author |
Ferreira, João D |
author_facet |
Ferreira, João D |
author_role |
author |
dc.contributor.none.fl_str_mv |
Couto, Francisco M Repositório da Universidade de Lisboa |
dc.contributor.author.fl_str_mv |
Ferreira, João D |
dc.subject.por.fl_str_mv |
Chemical compound similarity Machine learning Ontologies Semantic similarity |
topic |
Chemical compound similarity Machine learning Ontologies Semantic similarity |
description |
Over the last few decades, there has been an increasing number of attempts at creating systems capable of comparing and classifying chemical compounds based on their structure and/or physicochemical properties. While the rate of success of these approaches has been increasing, particularly with the introduction of new and ever more sophisticated methods of machine learning, there is still room for improvement. One of the problems of these methods is that they fail to consider that similar molecules may have di erent roles in nature, or, to a lesser extend, that disparate molecules may have similar roles. This thesis proposes the exploitation of the semantic properties of chemical compounds, as described in the ChEBI ontology, to create an e cient system able to automatically deal with the binary classi cation of chemical compounds. To that e ect, I developed Chym (Chemical Hybrid Metric) as a tool that integrates structural and semantic information in a unique hybrid metric. The work here presented shows substantial evidence supporting the e ectiveness of Chym, since it has outperformed all the models with which it was compared. Particularly, it achieved accuracy values of 90.9%, 87.7% and 84.2% when solving three classi cation problems which, previously, had only been solved with accuracy values of 81.5%, 80.6% and 82.8% respectively. Other results show that the tool is appropriate to use even if the problem at hand is not well represented in the ChEBI ontology. Thus, Chym shows that considering the semantic properties of a compound helps solving classi cation problems. Therefore, Chym can be used in projects that require the classi cation and/or the comparison of chemical compounds, such as the study of the evolution of metabolic pathways, drug discovery or in preliminary toxicity analysis. |
publishDate |
2010 |
dc.date.none.fl_str_mv |
2010-07-22T11:14:42Z 2010 2010-01-01T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10451/13866 |
url |
http://hdl.handle.net/10451/13866 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1817549955872587776 |