Representing Amino Acid Contacts In Protein Interfaces
Autor(a) principal: | |
---|---|
Data de Publicação: | 2020 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10362/107380 |
Resumo: | Proteins are composed of twenty different types of amino acids, small organic molecules with different chemical and physical properties resulting from different groups of atoms. Protein interactions are mediated by the affinity between groups of atoms belonging to amino acid residues at the surface of each protein, in the interface region. However, it is not clear at what level these contacts are best evaluated, whether by grouping similar amino acids together, considering parts of each amino acid or even individual atoms. The number of databanks and extracted features continue to increase, this means very rich data, but that also brings the problem of the sheer amount of different features and what do they really represent in the big picture of protein interactions.Since the data itself is collected by scientific communities all around the globe, there is a vast amount of information but with that there is also a great diversity of the measured or calculated attributes. This creates a need to learn at which level these contacts occur and what is the best way to combine the information in the literature to learn a valuable representation. With the rise of machine learning algorithms making possible to work with data in various ways that were not previously possible due to practical limitations, various areas are using these algorithms to capture information about the data that was inaccessible before, bioinformatics being one of them. The goal of this work is to use unsupervised deep learning techniques that transform the data in a way that is intended to be informative and non-redundant, facilitating the subsequent learning for other algorithms of classification or regression that will perform better on processed data like this. The transformation involves finding encodings for the collected features that best capture which are the ones that are actually relevant to construct these encodings. These encondings can be latent in relation to the already known information in the area, meaning that they most likely will not be human friendly, in the sense that they will lack interpretability for humans, but can increase the performance of machine learning algorithms. |
id |
RCAP_3e4dadaad53c84d36b4afdaba84dff75 |
---|---|
oai_identifier_str |
oai:run.unl.pt:10362/107380 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Representing Amino Acid Contacts In Protein InterfacesProteinAmino AcidAtomProtein InterfaceProtein interactionDeep LearningDomínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e InformáticaProteins are composed of twenty different types of amino acids, small organic molecules with different chemical and physical properties resulting from different groups of atoms. Protein interactions are mediated by the affinity between groups of atoms belonging to amino acid residues at the surface of each protein, in the interface region. However, it is not clear at what level these contacts are best evaluated, whether by grouping similar amino acids together, considering parts of each amino acid or even individual atoms. The number of databanks and extracted features continue to increase, this means very rich data, but that also brings the problem of the sheer amount of different features and what do they really represent in the big picture of protein interactions.Since the data itself is collected by scientific communities all around the globe, there is a vast amount of information but with that there is also a great diversity of the measured or calculated attributes. This creates a need to learn at which level these contacts occur and what is the best way to combine the information in the literature to learn a valuable representation. With the rise of machine learning algorithms making possible to work with data in various ways that were not previously possible due to practical limitations, various areas are using these algorithms to capture information about the data that was inaccessible before, bioinformatics being one of them. The goal of this work is to use unsupervised deep learning techniques that transform the data in a way that is intended to be informative and non-redundant, facilitating the subsequent learning for other algorithms of classification or regression that will perform better on processed data like this. The transformation involves finding encodings for the collected features that best capture which are the ones that are actually relevant to construct these encodings. These encondings can be latent in relation to the already known information in the area, meaning that they most likely will not be human friendly, in the sense that they will lack interpretability for humans, but can increase the performance of machine learning algorithms.Krippahl, LudwigRUNPires, João Paulo dos Santos2020-11-18T15:00:31Z2020-0720202020-07-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/107380enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T04:52:07Zoai:run.unl.pt:10362/107380Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:40:57.695745Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Representing Amino Acid Contacts In Protein Interfaces |
title |
Representing Amino Acid Contacts In Protein Interfaces |
spellingShingle |
Representing Amino Acid Contacts In Protein Interfaces Pires, João Paulo dos Santos Protein Amino Acid Atom Protein Interface Protein interaction Deep Learning Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática |
title_short |
Representing Amino Acid Contacts In Protein Interfaces |
title_full |
Representing Amino Acid Contacts In Protein Interfaces |
title_fullStr |
Representing Amino Acid Contacts In Protein Interfaces |
title_full_unstemmed |
Representing Amino Acid Contacts In Protein Interfaces |
title_sort |
Representing Amino Acid Contacts In Protein Interfaces |
author |
Pires, João Paulo dos Santos |
author_facet |
Pires, João Paulo dos Santos |
author_role |
author |
dc.contributor.none.fl_str_mv |
Krippahl, Ludwig RUN |
dc.contributor.author.fl_str_mv |
Pires, João Paulo dos Santos |
dc.subject.por.fl_str_mv |
Protein Amino Acid Atom Protein Interface Protein interaction Deep Learning Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática |
topic |
Protein Amino Acid Atom Protein Interface Protein interaction Deep Learning Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática |
description |
Proteins are composed of twenty different types of amino acids, small organic molecules with different chemical and physical properties resulting from different groups of atoms. Protein interactions are mediated by the affinity between groups of atoms belonging to amino acid residues at the surface of each protein, in the interface region. However, it is not clear at what level these contacts are best evaluated, whether by grouping similar amino acids together, considering parts of each amino acid or even individual atoms. The number of databanks and extracted features continue to increase, this means very rich data, but that also brings the problem of the sheer amount of different features and what do they really represent in the big picture of protein interactions.Since the data itself is collected by scientific communities all around the globe, there is a vast amount of information but with that there is also a great diversity of the measured or calculated attributes. This creates a need to learn at which level these contacts occur and what is the best way to combine the information in the literature to learn a valuable representation. With the rise of machine learning algorithms making possible to work with data in various ways that were not previously possible due to practical limitations, various areas are using these algorithms to capture information about the data that was inaccessible before, bioinformatics being one of them. The goal of this work is to use unsupervised deep learning techniques that transform the data in a way that is intended to be informative and non-redundant, facilitating the subsequent learning for other algorithms of classification or regression that will perform better on processed data like this. The transformation involves finding encodings for the collected features that best capture which are the ones that are actually relevant to construct these encodings. These encondings can be latent in relation to the already known information in the area, meaning that they most likely will not be human friendly, in the sense that they will lack interpretability for humans, but can increase the performance of machine learning algorithms. |
publishDate |
2020 |
dc.date.none.fl_str_mv |
2020-11-18T15:00:31Z 2020-07 2020 2020-07-01T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10362/107380 |
url |
http://hdl.handle.net/10362/107380 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799138023266320384 |