Representing Amino Acid Contacts In Protein Interfaces

Pires, João Paulo dos Santos

Representing Amino Acid Contacts In Protein Interfaces

Detalhes bibliográficos
Autor(a) principal:	Pires, João Paulo dos Santos
Data de Publicação:	2020
Tipo de documento:	Dissertação
Idioma:	eng
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	http://hdl.handle.net/10362/107380
Resumo:	Proteins are composed of twenty different types of amino acids, small organic molecules with different chemical and physical properties resulting from different groups of atoms. Protein interactions are mediated by the affinity between groups of atoms belonging to amino acid residues at the surface of each protein, in the interface region. However, it is not clear at what level these contacts are best evaluated, whether by grouping similar amino acids together, considering parts of each amino acid or even individual atoms. The number of databanks and extracted features continue to increase, this means very rich data, but that also brings the problem of the sheer amount of different features and what do they really represent in the big picture of protein interactions.Since the data itself is collected by scientific communities all around the globe, there is a vast amount of information but with that there is also a great diversity of the measured or calculated attributes. This creates a need to learn at which level these contacts occur and what is the best way to combine the information in the literature to learn a valuable representation. With the rise of machine learning algorithms making possible to work with data in various ways that were not previously possible due to practical limitations, various areas are using these algorithms to capture information about the data that was inaccessible before, bioinformatics being one of them. The goal of this work is to use unsupervised deep learning techniques that transform the data in a way that is intended to be informative and non-redundant, facilitating the subsequent learning for other algorithms of classification or regression that will perform better on processed data like this. The transformation involves finding encodings for the collected features that best capture which are the ones that are actually relevant to construct these encodings. These encondings can be latent in relation to the already known information in the area, meaning that they most likely will not be human friendly, in the sense that they will lack interpretability for humans, but can increase the performance of machine learning algorithms.

Metadados do item

id	RCAP_3e4dadaad53c84d36b4afdaba84dff75
oai_identifier_str	oai:run.unl.pt:10362/107380
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	Representing Amino Acid Contacts In Protein InterfacesProteinAmino AcidAtomProtein InterfaceProtein interactionDeep LearningDomínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e InformáticaProteins are composed of twenty different types of amino acids, small organic molecules with different chemical and physical properties resulting from different groups of atoms. Protein interactions are mediated by the affinity between groups of atoms belonging to amino acid residues at the surface of each protein, in the interface region. However, it is not clear at what level these contacts are best evaluated, whether by grouping similar amino acids together, considering parts of each amino acid or even individual atoms. The number of databanks and extracted features continue to increase, this means very rich data, but that also brings the problem of the sheer amount of different features and what do they really represent in the big picture of protein interactions.Since the data itself is collected by scientific communities all around the globe, there is a vast amount of information but with that there is also a great diversity of the measured or calculated attributes. This creates a need to learn at which level these contacts occur and what is the best way to combine the information in the literature to learn a valuable representation. With the rise of machine learning algorithms making possible to work with data in various ways that were not previously possible due to practical limitations, various areas are using these algorithms to capture information about the data that was inaccessible before, bioinformatics being one of them. The goal of this work is to use unsupervised deep learning techniques that transform the data in a way that is intended to be informative and non-redundant, facilitating the subsequent learning for other algorithms of classification or regression that will perform better on processed data like this. The transformation involves finding encodings for the collected features that best capture which are the ones that are actually relevant to construct these encodings. These encondings can be latent in relation to the already known information in the area, meaning that they most likely will not be human friendly, in the sense that they will lack interpretability for humans, but can increase the performance of machine learning algorithms.Krippahl, LudwigRUNPires, João Paulo dos Santos2020-11-18T15:00:31Z2020-0720202020-07-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/107380enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T04:52:07Zoai:run.unl.pt:10362/107380Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:40:57.695745Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	Representing Amino Acid Contacts In Protein Interfaces
title	Representing Amino Acid Contacts In Protein Interfaces
spellingShingle	Representing Amino Acid Contacts In Protein Interfaces Pires, João Paulo dos Santos Protein Amino Acid Atom Protein Interface Protein interaction Deep Learning Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
title_short	Representing Amino Acid Contacts In Protein Interfaces
title_full	Representing Amino Acid Contacts In Protein Interfaces
title_fullStr	Representing Amino Acid Contacts In Protein Interfaces
title_full_unstemmed	Representing Amino Acid Contacts In Protein Interfaces
title_sort	Representing Amino Acid Contacts In Protein Interfaces
author	Pires, João Paulo dos Santos
author_facet	Pires, João Paulo dos Santos
author_role	author
dc.contributor.none.fl_str_mv	Krippahl, Ludwig RUN
dc.contributor.author.fl_str_mv	Pires, João Paulo dos Santos
dc.subject.por.fl_str_mv	Protein Amino Acid Atom Protein Interface Protein interaction Deep Learning Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
topic	Protein Amino Acid Atom Protein Interface Protein interaction Deep Learning Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
description	Proteins are composed of twenty different types of amino acids, small organic molecules with different chemical and physical properties resulting from different groups of atoms. Protein interactions are mediated by the affinity between groups of atoms belonging to amino acid residues at the surface of each protein, in the interface region. However, it is not clear at what level these contacts are best evaluated, whether by grouping similar amino acids together, considering parts of each amino acid or even individual atoms. The number of databanks and extracted features continue to increase, this means very rich data, but that also brings the problem of the sheer amount of different features and what do they really represent in the big picture of protein interactions.Since the data itself is collected by scientific communities all around the globe, there is a vast amount of information but with that there is also a great diversity of the measured or calculated attributes. This creates a need to learn at which level these contacts occur and what is the best way to combine the information in the literature to learn a valuable representation. With the rise of machine learning algorithms making possible to work with data in various ways that were not previously possible due to practical limitations, various areas are using these algorithms to capture information about the data that was inaccessible before, bioinformatics being one of them. The goal of this work is to use unsupervised deep learning techniques that transform the data in a way that is intended to be informative and non-redundant, facilitating the subsequent learning for other algorithms of classification or regression that will perform better on processed data like this. The transformation involves finding encodings for the collected features that best capture which are the ones that are actually relevant to construct these encodings. These encondings can be latent in relation to the already known information in the area, meaning that they most likely will not be human friendly, in the sense that they will lack interpretability for humans, but can increase the performance of machine learning algorithms.
publishDate	2020
dc.date.none.fl_str_mv	2020-11-18T15:00:31Z 2020-07 2020 2020-07-01T00:00:00Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10362/107380
url	http://hdl.handle.net/10362/107380
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799138023266320384

Representing Amino Acid Contacts In Protein Interfaces

Registros relacionados