Structural and semantic similarity metrics for chemical compound classification

Detalhes bibliográficos
Autor(a) principal: Ferreira, João D
Data de Publicação: 2010
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10451/13866
Resumo: Over the last few decades, there has been an increasing number of attempts at creating systems capable of comparing and classifying chemical compounds based on their structure and/or physicochemical properties. While the rate of success of these approaches has been increasing, particularly with the introduction of new and ever more sophisticated methods of machine learning, there is still room for improvement. One of the problems of these methods is that they fail to consider that similar molecules may have di erent roles in nature, or, to a lesser extend, that disparate molecules may have similar roles. This thesis proposes the exploitation of the semantic properties of chemical compounds, as described in the ChEBI ontology, to create an e cient system able to automatically deal with the binary classi cation of chemical compounds. To that e ect, I developed Chym (Chemical Hybrid Metric) as a tool that integrates structural and semantic information in a unique hybrid metric. The work here presented shows substantial evidence supporting the e ectiveness of Chym, since it has outperformed all the models with which it was compared. Particularly, it achieved accuracy values of 90.9%, 87.7% and 84.2% when solving three classi cation problems which, previously, had only been solved with accuracy values of 81.5%, 80.6% and 82.8% respectively. Other results show that the tool is appropriate to use even if the problem at hand is not well represented in the ChEBI ontology. Thus, Chym shows that considering the semantic properties of a compound helps solving classi cation problems. Therefore, Chym can be used in projects that require the classi cation and/or the comparison of chemical compounds, such as the study of the evolution of metabolic pathways, drug discovery or in preliminary toxicity analysis.
id RCAP_6a5ae3c7b87f7b642d161d29a5cc8bfa
oai_identifier_str oai:repositorio.ul.pt:10451/13866
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Structural and semantic similarity metrics for chemical compound classificationChemical compound similarityMachine learningOntologiesSemantic similarityOver the last few decades, there has been an increasing number of attempts at creating systems capable of comparing and classifying chemical compounds based on their structure and/or physicochemical properties. While the rate of success of these approaches has been increasing, particularly with the introduction of new and ever more sophisticated methods of machine learning, there is still room for improvement. One of the problems of these methods is that they fail to consider that similar molecules may have di erent roles in nature, or, to a lesser extend, that disparate molecules may have similar roles. This thesis proposes the exploitation of the semantic properties of chemical compounds, as described in the ChEBI ontology, to create an e cient system able to automatically deal with the binary classi cation of chemical compounds. To that e ect, I developed Chym (Chemical Hybrid Metric) as a tool that integrates structural and semantic information in a unique hybrid metric. The work here presented shows substantial evidence supporting the e ectiveness of Chym, since it has outperformed all the models with which it was compared. Particularly, it achieved accuracy values of 90.9%, 87.7% and 84.2% when solving three classi cation problems which, previously, had only been solved with accuracy values of 81.5%, 80.6% and 82.8% respectively. Other results show that the tool is appropriate to use even if the problem at hand is not well represented in the ChEBI ontology. Thus, Chym shows that considering the semantic properties of a compound helps solving classi cation problems. Therefore, Chym can be used in projects that require the classi cation and/or the comparison of chemical compounds, such as the study of the evolution of metabolic pathways, drug discovery or in preliminary toxicity analysis.Couto, Francisco MRepositório da Universidade de LisboaFerreira, João D2010-07-22T11:14:42Z20102010-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10451/13866enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-08T15:59:17Zoai:repositorio.ul.pt:10451/13866Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T21:35:47.003020Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Structural and semantic similarity metrics for chemical compound classification
title Structural and semantic similarity metrics for chemical compound classification
spellingShingle Structural and semantic similarity metrics for chemical compound classification
Ferreira, João D
Chemical compound similarity
Machine learning
Ontologies
Semantic similarity
title_short Structural and semantic similarity metrics for chemical compound classification
title_full Structural and semantic similarity metrics for chemical compound classification
title_fullStr Structural and semantic similarity metrics for chemical compound classification
title_full_unstemmed Structural and semantic similarity metrics for chemical compound classification
title_sort Structural and semantic similarity metrics for chemical compound classification
author Ferreira, João D
author_facet Ferreira, João D
author_role author
dc.contributor.none.fl_str_mv Couto, Francisco M
Repositório da Universidade de Lisboa
dc.contributor.author.fl_str_mv Ferreira, João D
dc.subject.por.fl_str_mv Chemical compound similarity
Machine learning
Ontologies
Semantic similarity
topic Chemical compound similarity
Machine learning
Ontologies
Semantic similarity
description Over the last few decades, there has been an increasing number of attempts at creating systems capable of comparing and classifying chemical compounds based on their structure and/or physicochemical properties. While the rate of success of these approaches has been increasing, particularly with the introduction of new and ever more sophisticated methods of machine learning, there is still room for improvement. One of the problems of these methods is that they fail to consider that similar molecules may have di erent roles in nature, or, to a lesser extend, that disparate molecules may have similar roles. This thesis proposes the exploitation of the semantic properties of chemical compounds, as described in the ChEBI ontology, to create an e cient system able to automatically deal with the binary classi cation of chemical compounds. To that e ect, I developed Chym (Chemical Hybrid Metric) as a tool that integrates structural and semantic information in a unique hybrid metric. The work here presented shows substantial evidence supporting the e ectiveness of Chym, since it has outperformed all the models with which it was compared. Particularly, it achieved accuracy values of 90.9%, 87.7% and 84.2% when solving three classi cation problems which, previously, had only been solved with accuracy values of 81.5%, 80.6% and 82.8% respectively. Other results show that the tool is appropriate to use even if the problem at hand is not well represented in the ChEBI ontology. Thus, Chym shows that considering the semantic properties of a compound helps solving classi cation problems. Therefore, Chym can be used in projects that require the classi cation and/or the comparison of chemical compounds, such as the study of the evolution of metabolic pathways, drug discovery or in preliminary toxicity analysis.
publishDate 2010
dc.date.none.fl_str_mv 2010-07-22T11:14:42Z
2010
2010-01-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10451/13866
url http://hdl.handle.net/10451/13866
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1817549955872587776