To be or NOT to be: The Impact of Negative Annotation in Biomedical Semantic Similarity
Autor(a) principal: | |
---|---|
Data de Publicação: | 2021 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10451/53810 |
Resumo: | Tese de mestrado, Bioinformática e Biologia Computacional, Universidade de Lisboa, Faculdade de Ciências, 2022 |
id |
RCAP_65f1c930c6e1f17eb22156060c727937 |
---|---|
oai_identifier_str |
oai:repositorio.ul.pt:10451/53810 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
To be or NOT to be: The Impact of Negative Annotation in Biomedical Semantic SimilaritySemelhança SemânticaOntologia biomédicaAnotação negativaPrevisão Interação Proteína-ProteínaPrevisão de doençaTeses de mestrado - 2022Departamento de InformáticaTese de mestrado, Bioinformática e Biologia Computacional, Universidade de Lisboa, Faculdade de Ciências, 2022Classical Semantic Similarity Measures did not consider negative annotations in similarity compu tation, and the impact that these annotations can have in this data mining technique is not well studied. As such, this work aims to understand how the addition of negative annotations impacts semantic sim ilarity. To do so, two pairwise similarity measures, Best-Match Average and Resnik, were adapted to create the polar measures PolarBMA and PolarResnik. These were evaluated in two currently relevant scopes: protein-protein interaction prediction and disease prediction against the original measures. Pairs of proteins where the proteins were known to interact or not were taken from STRING and enriched with positive and negative annotations from the Gene Ontology. Synthetic patients were created as sets of annotations taken from the Mendelian diseases they were designed to have, as well as possible noise or imprecise annotations. Then semantic similarity was computed with both polar and non-polar measures between proteins in pairs and between patients and candidate diseases including the Mendelian diseases, as well as random diseases taken from the Human Phenotype Ontology. To evaluate if the polar measures performed well in comparison to the baseline, a ranking according to semantic similarity was made for each measure and scope for evaluation and the rank cumulative frequencies were plotted. ROC AUC and Precision-Recall curves were also determined for the Protein Protein interaction(PPI) prediction, as well as average precision for the disease prediction dataset. In PPI prediction, polar measures had an increased performance in the Molecular Function branch for both experiments where negative annotations were added and also in one of the experiments with the Cellular Component branch. In the disease prediction scope, polar measures had an improved performance of approximately ten percent. This improvement was verified in all disease prediction experiments, even with the addition of noise and imprecision. Considering the results obtained, this work concludes that negative annotations have an impact on semantic similarity, but the amplitude of this impact requires further study.Pesquita, Cátia, 1980-Repositório da Universidade de LisboaAveiro, Lina Andreia Gama2022-07-18T08:16:57Z202220212022-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10451/53810enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-08T17:00:01Zoai:repositorio.ul.pt:10451/53810Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T22:04:48.358435Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
To be or NOT to be: The Impact of Negative Annotation in Biomedical Semantic Similarity |
title |
To be or NOT to be: The Impact of Negative Annotation in Biomedical Semantic Similarity |
spellingShingle |
To be or NOT to be: The Impact of Negative Annotation in Biomedical Semantic Similarity Aveiro, Lina Andreia Gama Semelhança Semântica Ontologia biomédica Anotação negativa Previsão Interação Proteína-Proteína Previsão de doença Teses de mestrado - 2022 Departamento de Informática |
title_short |
To be or NOT to be: The Impact of Negative Annotation in Biomedical Semantic Similarity |
title_full |
To be or NOT to be: The Impact of Negative Annotation in Biomedical Semantic Similarity |
title_fullStr |
To be or NOT to be: The Impact of Negative Annotation in Biomedical Semantic Similarity |
title_full_unstemmed |
To be or NOT to be: The Impact of Negative Annotation in Biomedical Semantic Similarity |
title_sort |
To be or NOT to be: The Impact of Negative Annotation in Biomedical Semantic Similarity |
author |
Aveiro, Lina Andreia Gama |
author_facet |
Aveiro, Lina Andreia Gama |
author_role |
author |
dc.contributor.none.fl_str_mv |
Pesquita, Cátia, 1980- Repositório da Universidade de Lisboa |
dc.contributor.author.fl_str_mv |
Aveiro, Lina Andreia Gama |
dc.subject.por.fl_str_mv |
Semelhança Semântica Ontologia biomédica Anotação negativa Previsão Interação Proteína-Proteína Previsão de doença Teses de mestrado - 2022 Departamento de Informática |
topic |
Semelhança Semântica Ontologia biomédica Anotação negativa Previsão Interação Proteína-Proteína Previsão de doença Teses de mestrado - 2022 Departamento de Informática |
description |
Tese de mestrado, Bioinformática e Biologia Computacional, Universidade de Lisboa, Faculdade de Ciências, 2022 |
publishDate |
2021 |
dc.date.none.fl_str_mv |
2021 2022-07-18T08:16:57Z 2022 2022-01-01T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10451/53810 |
url |
http://hdl.handle.net/10451/53810 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799134599501053952 |