Optimizing Data Selection for Contact Prediction in Proteins

Detalhes bibliográficos
Autor(a) principal: Fial, Guilherme José Gago
Data de Publicação: 2019
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10362/91154
Resumo: Proteins are essential to life across all organisms. They act as enzymes, antibodies, transporters of molecules, structural elements, among other important roles. Their ability to interact with specific molecules in a selective manner, is what makes them important. Being able to understand their interaction can provide many advantages in fields such as drug design and metabolic engineering. Current methods of predicting protein interaction attempt to geometrically fit the structures of two proteins together by generating a large amount of potential configurations and then discriminating the correct pose from the remaining ones. Given the large search space, approaches to reduce the complexity are often employed. Identifying a contact point between the pairing proteins is a good constraining factor. If at least one contact can be predicted among a small set of possibilities (e.g. 100), the search space will be significantly reduced. Using structural and evolutionary information of the interacting proteins, a machine learning predictor can be developed for this task. Such evolutionary measures are computed over a substantial amount of homologous sequences, which can be filtered and ordered in many different ways. As a result, a machine learning solution was developed that focused in measuring the effects that differing homolog arrangements can have over the final prediction.
id RCAP_7cec65f0339dfc12ab2c4a96f9a70a07
oai_identifier_str oai:run.unl.pt:10362/91154
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Optimizing Data Selection for Contact Prediction in ProteinsContact predictionMachine learningBioinformaticsProtein-Protein InteractionsDomínio/Área Científica::Engenharia e Tecnologia::Engenharia dos MateriaisProteins are essential to life across all organisms. They act as enzymes, antibodies, transporters of molecules, structural elements, among other important roles. Their ability to interact with specific molecules in a selective manner, is what makes them important. Being able to understand their interaction can provide many advantages in fields such as drug design and metabolic engineering. Current methods of predicting protein interaction attempt to geometrically fit the structures of two proteins together by generating a large amount of potential configurations and then discriminating the correct pose from the remaining ones. Given the large search space, approaches to reduce the complexity are often employed. Identifying a contact point between the pairing proteins is a good constraining factor. If at least one contact can be predicted among a small set of possibilities (e.g. 100), the search space will be significantly reduced. Using structural and evolutionary information of the interacting proteins, a machine learning predictor can be developed for this task. Such evolutionary measures are computed over a substantial amount of homologous sequences, which can be filtered and ordered in many different ways. As a result, a machine learning solution was developed that focused in measuring the effects that differing homolog arrangements can have over the final prediction.Krippahl, LudwigRUNFial, Guilherme José Gago2020-01-14T10:48:56Z201920192019-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/91154enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T04:40:28Zoai:run.unl.pt:10362/91154Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:37:16.513402Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Optimizing Data Selection for Contact Prediction in Proteins
title Optimizing Data Selection for Contact Prediction in Proteins
spellingShingle Optimizing Data Selection for Contact Prediction in Proteins
Fial, Guilherme José Gago
Contact prediction
Machine learning
Bioinformatics
Protein-Protein Interactions
Domínio/Área Científica::Engenharia e Tecnologia::Engenharia dos Materiais
title_short Optimizing Data Selection for Contact Prediction in Proteins
title_full Optimizing Data Selection for Contact Prediction in Proteins
title_fullStr Optimizing Data Selection for Contact Prediction in Proteins
title_full_unstemmed Optimizing Data Selection for Contact Prediction in Proteins
title_sort Optimizing Data Selection for Contact Prediction in Proteins
author Fial, Guilherme José Gago
author_facet Fial, Guilherme José Gago
author_role author
dc.contributor.none.fl_str_mv Krippahl, Ludwig
RUN
dc.contributor.author.fl_str_mv Fial, Guilherme José Gago
dc.subject.por.fl_str_mv Contact prediction
Machine learning
Bioinformatics
Protein-Protein Interactions
Domínio/Área Científica::Engenharia e Tecnologia::Engenharia dos Materiais
topic Contact prediction
Machine learning
Bioinformatics
Protein-Protein Interactions
Domínio/Área Científica::Engenharia e Tecnologia::Engenharia dos Materiais
description Proteins are essential to life across all organisms. They act as enzymes, antibodies, transporters of molecules, structural elements, among other important roles. Their ability to interact with specific molecules in a selective manner, is what makes them important. Being able to understand their interaction can provide many advantages in fields such as drug design and metabolic engineering. Current methods of predicting protein interaction attempt to geometrically fit the structures of two proteins together by generating a large amount of potential configurations and then discriminating the correct pose from the remaining ones. Given the large search space, approaches to reduce the complexity are often employed. Identifying a contact point between the pairing proteins is a good constraining factor. If at least one contact can be predicted among a small set of possibilities (e.g. 100), the search space will be significantly reduced. Using structural and evolutionary information of the interacting proteins, a machine learning predictor can be developed for this task. Such evolutionary measures are computed over a substantial amount of homologous sequences, which can be filtered and ordered in many different ways. As a result, a machine learning solution was developed that focused in measuring the effects that differing homolog arrangements can have over the final prediction.
publishDate 2019
dc.date.none.fl_str_mv 2019
2019
2019-01-01T00:00:00Z
2020-01-14T10:48:56Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/91154
url http://hdl.handle.net/10362/91154
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799137989612273665