Representing Amino Acid Contacts In Protein Interfaces

Detalhes bibliográficos
Autor(a) principal: Pires, João Paulo dos Santos
Data de Publicação: 2020
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10362/107380
Resumo: Proteins are composed of twenty different types of amino acids, small organic molecules with different chemical and physical properties resulting from different groups of atoms. Protein interactions are mediated by the affinity between groups of atoms belonging to amino acid residues at the surface of each protein, in the interface region. However, it is not clear at what level these contacts are best evaluated, whether by grouping similar amino acids together, considering parts of each amino acid or even individual atoms. The number of databanks and extracted features continue to increase, this means very rich data, but that also brings the problem of the sheer amount of different features and what do they really represent in the big picture of protein interactions.Since the data itself is collected by scientific communities all around the globe, there is a vast amount of information but with that there is also a great diversity of the measured or calculated attributes. This creates a need to learn at which level these contacts occur and what is the best way to combine the information in the literature to learn a valuable representation. With the rise of machine learning algorithms making possible to work with data in various ways that were not previously possible due to practical limitations, various areas are using these algorithms to capture information about the data that was inaccessible before, bioinformatics being one of them. The goal of this work is to use unsupervised deep learning techniques that transform the data in a way that is intended to be informative and non-redundant, facilitating the subsequent learning for other algorithms of classification or regression that will perform better on processed data like this. The transformation involves finding encodings for the collected features that best capture which are the ones that are actually relevant to construct these encodings. These encondings can be latent in relation to the already known information in the area, meaning that they most likely will not be human friendly, in the sense that they will lack interpretability for humans, but can increase the performance of machine learning algorithms.
id RCAP_3e4dadaad53c84d36b4afdaba84dff75
oai_identifier_str oai:run.unl.pt:10362/107380
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Representing Amino Acid Contacts In Protein InterfacesProteinAmino AcidAtomProtein InterfaceProtein interactionDeep LearningDomínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e InformáticaProteins are composed of twenty different types of amino acids, small organic molecules with different chemical and physical properties resulting from different groups of atoms. Protein interactions are mediated by the affinity between groups of atoms belonging to amino acid residues at the surface of each protein, in the interface region. However, it is not clear at what level these contacts are best evaluated, whether by grouping similar amino acids together, considering parts of each amino acid or even individual atoms. The number of databanks and extracted features continue to increase, this means very rich data, but that also brings the problem of the sheer amount of different features and what do they really represent in the big picture of protein interactions.Since the data itself is collected by scientific communities all around the globe, there is a vast amount of information but with that there is also a great diversity of the measured or calculated attributes. This creates a need to learn at which level these contacts occur and what is the best way to combine the information in the literature to learn a valuable representation. With the rise of machine learning algorithms making possible to work with data in various ways that were not previously possible due to practical limitations, various areas are using these algorithms to capture information about the data that was inaccessible before, bioinformatics being one of them. The goal of this work is to use unsupervised deep learning techniques that transform the data in a way that is intended to be informative and non-redundant, facilitating the subsequent learning for other algorithms of classification or regression that will perform better on processed data like this. The transformation involves finding encodings for the collected features that best capture which are the ones that are actually relevant to construct these encodings. These encondings can be latent in relation to the already known information in the area, meaning that they most likely will not be human friendly, in the sense that they will lack interpretability for humans, but can increase the performance of machine learning algorithms.Krippahl, LudwigRUNPires, João Paulo dos Santos2020-11-18T15:00:31Z2020-0720202020-07-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/107380enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T04:52:07Zoai:run.unl.pt:10362/107380Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:40:57.695745Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Representing Amino Acid Contacts In Protein Interfaces
title Representing Amino Acid Contacts In Protein Interfaces
spellingShingle Representing Amino Acid Contacts In Protein Interfaces
Pires, João Paulo dos Santos
Protein
Amino Acid
Atom
Protein Interface
Protein interaction
Deep Learning
Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
title_short Representing Amino Acid Contacts In Protein Interfaces
title_full Representing Amino Acid Contacts In Protein Interfaces
title_fullStr Representing Amino Acid Contacts In Protein Interfaces
title_full_unstemmed Representing Amino Acid Contacts In Protein Interfaces
title_sort Representing Amino Acid Contacts In Protein Interfaces
author Pires, João Paulo dos Santos
author_facet Pires, João Paulo dos Santos
author_role author
dc.contributor.none.fl_str_mv Krippahl, Ludwig
RUN
dc.contributor.author.fl_str_mv Pires, João Paulo dos Santos
dc.subject.por.fl_str_mv Protein
Amino Acid
Atom
Protein Interface
Protein interaction
Deep Learning
Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
topic Protein
Amino Acid
Atom
Protein Interface
Protein interaction
Deep Learning
Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
description Proteins are composed of twenty different types of amino acids, small organic molecules with different chemical and physical properties resulting from different groups of atoms. Protein interactions are mediated by the affinity between groups of atoms belonging to amino acid residues at the surface of each protein, in the interface region. However, it is not clear at what level these contacts are best evaluated, whether by grouping similar amino acids together, considering parts of each amino acid or even individual atoms. The number of databanks and extracted features continue to increase, this means very rich data, but that also brings the problem of the sheer amount of different features and what do they really represent in the big picture of protein interactions.Since the data itself is collected by scientific communities all around the globe, there is a vast amount of information but with that there is also a great diversity of the measured or calculated attributes. This creates a need to learn at which level these contacts occur and what is the best way to combine the information in the literature to learn a valuable representation. With the rise of machine learning algorithms making possible to work with data in various ways that were not previously possible due to practical limitations, various areas are using these algorithms to capture information about the data that was inaccessible before, bioinformatics being one of them. The goal of this work is to use unsupervised deep learning techniques that transform the data in a way that is intended to be informative and non-redundant, facilitating the subsequent learning for other algorithms of classification or regression that will perform better on processed data like this. The transformation involves finding encodings for the collected features that best capture which are the ones that are actually relevant to construct these encodings. These encondings can be latent in relation to the already known information in the area, meaning that they most likely will not be human friendly, in the sense that they will lack interpretability for humans, but can increase the performance of machine learning algorithms.
publishDate 2020
dc.date.none.fl_str_mv 2020-11-18T15:00:31Z
2020-07
2020
2020-07-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/107380
url http://hdl.handle.net/10362/107380
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799138023266320384