Implementação e análise de modelos de solvatação para a predição ab initio de estruturas de proteínas

Detalhes bibliográficos
Autor(a) principal: Rocha, Gregório Kappaun
Data de Publicação: 2011
Tipo de documento: Dissertação
Idioma: por
Título da fonte: Biblioteca Digital de Teses e Dissertações do LNCC
Texto Completo: https://tede.lncc.br/handle/tede/156
Resumo: The problem of predicting the native structure of proteins from their amino acid sequence is one of the major challenges of computational biology and implies very high computational cost. Several attempts have been made in the search for efficient algorithms and simplified models for protein structure prediction. In this respect, the inclusion and the correct description of the effects of protein solvent interaction are essential for the success of these methods, considering that the solvent plays a key role in the folding process and structural stability of proteins. Despite recent progress, the modelling of protein-solvent interactions in computer simulations remains a challenge. Implicit solvation models use different strategies to reproduce the effects of the solvent without representing their molecules discretely, doing this with a direct estimate of the solvation free energy. This work aimed to implement and carry out a comparative analysis of implicit solvation models described in the literature, and evaluate the impact of such models in the predictive capacity of the GAPF protein structure prediction suite ( Genetic Algorithms for Protein Folding ), developed in our research group GMMSB/LNCC. As some of the solvation models require the value of solvent accessible surface area (SASA) in their calculations, it was also necessary to implement a method that unites accuracy and computational efficiency. The methodology used in the POPS program was implemented for the calculation of the SASA, and the program MSMS is used as a reference for validation. Four solvation models were analyzed: EAS, I-SOLV, EEF1 e GBobc (used as reference). The thermal unfolding of 15 proteins via molecular dynamics (using the GROMACS simulation package) was carried out to evaluate the solvation models before these were placed in the context of a program for protein structure prediction. Seeking to evaluate the impact of each solvation model on GAPF, large scale tests on the ab initio prediction of structures of a set of 24 proteins were performed. The results show that: (i) the use of the POPS methodology is a good alternative for calculating the SASA; (ii) the solvation models I-SOLV, EEF1 and GBobc reflect the behavior of the SASA in their solvation free energies; (iii) with the exception of EAS, all the models were able to discriminate folding from unfolding structures; (iv) no model was able to discriminate close to native structures from structures with similar compression but with high RMSD (folded incorrectly); (v) the I-SOLV and EEF1 were the solvation models that came closest to the reference model GBobc; (vi) the solvation models I-SOLV and EEF1 provided an improvement in RMSD of predicted structures in the program GAPF with respect to experimental structures; (vii) the I-SOLV and EAS have the lowest computational cost among the evaluated solvation models, being faster than GBobc. The solvation models I-SOLV and EEF1 are the best alternative among those studied to model the effects of the solvation in the protein structure prediction.
id LNCC_610e0841835448acda49d639a360b419
oai_identifier_str oai:tede-server.lncc.br:tede/156
network_acronym_str LNCC
network_name_str Biblioteca Digital de Teses e Dissertações do LNCC
repository_id_str
spelling Dardenne, Laurent EmmanuelCPF:49809431104http://lattes.cnpq.br/8344194525615133Custódio, Fábio LimaCPF:08159264720http://lattes.cnpq.br/9126339190151859Barbosa, Helio José CorrêaCPF:194 306 716 34http://lattes.cnpq.br/0375745110240885Pascutti, Pedro Geraldohttp://lattes.cnpq.br/61425584109227273CPF:12076685758http://lattes.cnpq.br/7690535205003366Rocha, Gregório Kappaun2015-03-04T18:57:48Z2013-07-082011-12-09https://tede.lncc.br/handle/tede/156The problem of predicting the native structure of proteins from their amino acid sequence is one of the major challenges of computational biology and implies very high computational cost. Several attempts have been made in the search for efficient algorithms and simplified models for protein structure prediction. In this respect, the inclusion and the correct description of the effects of protein solvent interaction are essential for the success of these methods, considering that the solvent plays a key role in the folding process and structural stability of proteins. Despite recent progress, the modelling of protein-solvent interactions in computer simulations remains a challenge. Implicit solvation models use different strategies to reproduce the effects of the solvent without representing their molecules discretely, doing this with a direct estimate of the solvation free energy. This work aimed to implement and carry out a comparative analysis of implicit solvation models described in the literature, and evaluate the impact of such models in the predictive capacity of the GAPF protein structure prediction suite ( Genetic Algorithms for Protein Folding ), developed in our research group GMMSB/LNCC. As some of the solvation models require the value of solvent accessible surface area (SASA) in their calculations, it was also necessary to implement a method that unites accuracy and computational efficiency. The methodology used in the POPS program was implemented for the calculation of the SASA, and the program MSMS is used as a reference for validation. Four solvation models were analyzed: EAS, I-SOLV, EEF1 e GBobc (used as reference). The thermal unfolding of 15 proteins via molecular dynamics (using the GROMACS simulation package) was carried out to evaluate the solvation models before these were placed in the context of a program for protein structure prediction. Seeking to evaluate the impact of each solvation model on GAPF, large scale tests on the ab initio prediction of structures of a set of 24 proteins were performed. The results show that: (i) the use of the POPS methodology is a good alternative for calculating the SASA; (ii) the solvation models I-SOLV, EEF1 and GBobc reflect the behavior of the SASA in their solvation free energies; (iii) with the exception of EAS, all the models were able to discriminate folding from unfolding structures; (iv) no model was able to discriminate close to native structures from structures with similar compression but with high RMSD (folded incorrectly); (v) the I-SOLV and EEF1 were the solvation models that came closest to the reference model GBobc; (vi) the solvation models I-SOLV and EEF1 provided an improvement in RMSD of predicted structures in the program GAPF with respect to experimental structures; (vii) the I-SOLV and EAS have the lowest computational cost among the evaluated solvation models, being faster than GBobc. The solvation models I-SOLV and EEF1 are the best alternative among those studied to model the effects of the solvation in the protein structure prediction.O problema da predição da estrutura nativa de proteínas a partir da sua seqüência de aminoácidos é um dos grandes desafios da biologia computacional e implica em altíssimo custo computacional. Várias tentativas vêm sendo realizadas na busca de algoritmos e de modelos simplificados e eficientes para a predição de estruturas de proteínas. Nesse sentido, a inclusão e a correta descrição dos efeitos das interações entre a proteína e o solvente são essenciais para o sucesso desses métodos, haja vista que o solvente tem um papel fundamental no processo de enovelamento e estabilidade estrutural das proteínas. Apesar dos recentes progressos, a modelagem da interação proteína-solvente em simulações computacionais ainda é um desafio. Os modelos implícitos de solvatação utilizam diferentes estratégias para reproduzir os efeitos do solvente sem representar de forma discreta suas moléculas, e o fazem através de uma estimativa direta da energia livre de solvatação. O objetivo geral deste trabalho foi implementar e realizar uma análise comparativa de modelos implícitos de solvatação descritos na literatura, além de avaliar o impacto de tais modelos na capacidade preditiva do programa de predição ab initio de estruturas de proteínas GAPF, desenvolvido no GMMSB/LNCC. Como alguns modelos de solvatação requerem o valor da área de superfície acessível ao solvente (SASA) em seus cálculos, tornou-se também necessário, a implementação de um método que atrele acurácia e eficiência computacional. A metodologia utilizada no programa POPS foi implementada para o cálculo da SASA e o programa MSMS foi usado como referência para a validação da mesma. Quatro modelos de solvatação foram analisados: EAS, I-SOLV, EEF1 e GBobc (usado como referência). O desenovelamento térmico de 15 proteínas via dinâmica molecular (utilizando o pacote de simulação GROMACS) foi realizado com o objetivo de avaliar os modelos de solvatação antes desses serem inseridos no contexto de um programa de predição de estrutura de proteínas. Buscando apreciar o impacto de cada modelo de solvatação no programa GAPF, foram realizados testes em larga escala para a predição ab initio de estruturas de um conjunto de 24 proteínas. Os resultados obtidos mostram que: (i) o uso da metodologia do POPS apresenta-se como uma boa alternativa para o cálculo da SASA; (ii) os modelos de solvatação I-SOLV, EEF1 e GBobc refletem o comportamento da SASA nas suas energias livres de solvatação; (iii) com exceção do EAS, os demais modelos se mostram capazes de discriminar estruturas enoveladas de estruturas desenoveladas; (iv) nenhum dos modelos foi capaz de discriminar de forma satisfatória estruturas enoveladas de estruturas com compactação similar e alto RMSD (enoveladas incorretamente); (v) os modelos I-SOLV e EEF1 foram os que mais se aproximaram do modelo de referência GBobc; (vi) os modelos de solvatação I-SOLV e EEF1 proporcionaram uma melhora no RMSD das estruturas preditas no programa GAPF em relação às estruturas experimentais; (vii) o ISOLV e o EAS apresentam o menor custo computacional dentre os modelos de solvatação avaliados, sendo mais rápidos que o GBobc. Os modelos de solvatação ISOLV e EEF1 apresentam-se como as melhores alternativas, dentre as estudadas, para a modelagem dos efeitos da solvatação na predição de estrutura de proteínas.Made available in DSpace on 2015-03-04T18:57:48Z (GMT). No. of bitstreams: 1 Dissertacao_Corrigida_Final_Gregorio_2011.pdf: 7241946 bytes, checksum: cb82d524114ee5f3199d1239571fb2cd (MD5) Previous issue date: 2011-12-09application/pdfhttp://tede-server.lncc.br:8080/retrieve/482/Dissertacao_Corrigida_Final_Gregorio_2011.pdf.jpghttp://tede-server.lncc.br:8080/retrieve/696/Dissertacao_Corrigida_Final_Gregorio_2011.pdf.jpgporLaboratório Nacional de Computação CientificaPrograma de Pós-Graduação em Modelagem ComputacionalLNCCBRCoordenação de Pós-Graduação e Aperfeiçoamento (COPGA)Estrutura molecularBioinformáticaBioinformaticsMolecular structureCNPQ::CIENCIAS BIOLOGICAS::BIOQUIMICA::BIOLOGIA MOLECULARImplementação e análise de modelos de solvatação para a predição ab initio de estruturas de proteínasImplementation and analysis of solvation models for ab initio prediction of protein structuresinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisinfo:eu-repo/semantics/openAccessreponame:Biblioteca Digital de Teses e Dissertações do LNCCinstname:Laboratório Nacional de Computação Científica (LNCC)instacron:LNCCORIGINALDissertacao_Corrigida_Final_Gregorio_2011.pdfapplication/pdf7241946http://tede-server.lncc.br:8080/tede/bitstream/tede/156/1/Dissertacao_Corrigida_Final_Gregorio_2011.pdfcb82d524114ee5f3199d1239571fb2cdMD51THUMBNAILDissertacao_Corrigida_Final_Gregorio_2011.pdf.jpgDissertacao_Corrigida_Final_Gregorio_2011.pdf.jpgimage/jpeg2050http://tede-server.lncc.br:8080/tede/bitstream/tede/156/2/Dissertacao_Corrigida_Final_Gregorio_2011.pdf.jpg809ca746ac13e403b54c5c6decc6be63MD52tede/1562023-06-02 12:01:12.551oai:tede-server.lncc.br:tede/156Biblioteca Digital de Teses e Dissertaçõeshttps://tede.lncc.br/PUBhttps://tede.lncc.br/oai/requestlibrary@lncc.br||library@lncc.bropendoar:2023-06-02T15:01:12Biblioteca Digital de Teses e Dissertações do LNCC - Laboratório Nacional de Computação Científica (LNCC)false
dc.title.por.fl_str_mv Implementação e análise de modelos de solvatação para a predição ab initio de estruturas de proteínas
dc.title.alternative.eng.fl_str_mv Implementation and analysis of solvation models for ab initio prediction of protein structures
title Implementação e análise de modelos de solvatação para a predição ab initio de estruturas de proteínas
spellingShingle Implementação e análise de modelos de solvatação para a predição ab initio de estruturas de proteínas
Rocha, Gregório Kappaun
Estrutura molecular
Bioinformática
Bioinformatics
Molecular structure
CNPQ::CIENCIAS BIOLOGICAS::BIOQUIMICA::BIOLOGIA MOLECULAR
title_short Implementação e análise de modelos de solvatação para a predição ab initio de estruturas de proteínas
title_full Implementação e análise de modelos de solvatação para a predição ab initio de estruturas de proteínas
title_fullStr Implementação e análise de modelos de solvatação para a predição ab initio de estruturas de proteínas
title_full_unstemmed Implementação e análise de modelos de solvatação para a predição ab initio de estruturas de proteínas
title_sort Implementação e análise de modelos de solvatação para a predição ab initio de estruturas de proteínas
author Rocha, Gregório Kappaun
author_facet Rocha, Gregório Kappaun
author_role author
dc.contributor.advisor1.fl_str_mv Dardenne, Laurent Emmanuel
dc.contributor.advisor1ID.fl_str_mv CPF:49809431104
dc.contributor.advisor1Lattes.fl_str_mv http://lattes.cnpq.br/8344194525615133
dc.contributor.advisor-co1.fl_str_mv Custódio, Fábio Lima
dc.contributor.advisor-co1ID.fl_str_mv CPF:08159264720
dc.contributor.advisor-co1Lattes.fl_str_mv http://lattes.cnpq.br/9126339190151859
dc.contributor.referee1.fl_str_mv Barbosa, Helio José Corrêa
dc.contributor.referee1ID.fl_str_mv CPF:194 306 716 34
dc.contributor.referee1Lattes.fl_str_mv http://lattes.cnpq.br/0375745110240885
dc.contributor.referee2.fl_str_mv Pascutti, Pedro Geraldo
dc.contributor.referee2Lattes.fl_str_mv http://lattes.cnpq.br/61425584109227273
dc.contributor.authorID.fl_str_mv CPF:12076685758
dc.contributor.authorLattes.fl_str_mv http://lattes.cnpq.br/7690535205003366
dc.contributor.author.fl_str_mv Rocha, Gregório Kappaun
contributor_str_mv Dardenne, Laurent Emmanuel
Custódio, Fábio Lima
Barbosa, Helio José Corrêa
Pascutti, Pedro Geraldo
dc.subject.por.fl_str_mv Estrutura molecular
Bioinformática
topic Estrutura molecular
Bioinformática
Bioinformatics
Molecular structure
CNPQ::CIENCIAS BIOLOGICAS::BIOQUIMICA::BIOLOGIA MOLECULAR
dc.subject.eng.fl_str_mv Bioinformatics
Molecular structure
dc.subject.cnpq.fl_str_mv CNPQ::CIENCIAS BIOLOGICAS::BIOQUIMICA::BIOLOGIA MOLECULAR
description The problem of predicting the native structure of proteins from their amino acid sequence is one of the major challenges of computational biology and implies very high computational cost. Several attempts have been made in the search for efficient algorithms and simplified models for protein structure prediction. In this respect, the inclusion and the correct description of the effects of protein solvent interaction are essential for the success of these methods, considering that the solvent plays a key role in the folding process and structural stability of proteins. Despite recent progress, the modelling of protein-solvent interactions in computer simulations remains a challenge. Implicit solvation models use different strategies to reproduce the effects of the solvent without representing their molecules discretely, doing this with a direct estimate of the solvation free energy. This work aimed to implement and carry out a comparative analysis of implicit solvation models described in the literature, and evaluate the impact of such models in the predictive capacity of the GAPF protein structure prediction suite ( Genetic Algorithms for Protein Folding ), developed in our research group GMMSB/LNCC. As some of the solvation models require the value of solvent accessible surface area (SASA) in their calculations, it was also necessary to implement a method that unites accuracy and computational efficiency. The methodology used in the POPS program was implemented for the calculation of the SASA, and the program MSMS is used as a reference for validation. Four solvation models were analyzed: EAS, I-SOLV, EEF1 e GBobc (used as reference). The thermal unfolding of 15 proteins via molecular dynamics (using the GROMACS simulation package) was carried out to evaluate the solvation models before these were placed in the context of a program for protein structure prediction. Seeking to evaluate the impact of each solvation model on GAPF, large scale tests on the ab initio prediction of structures of a set of 24 proteins were performed. The results show that: (i) the use of the POPS methodology is a good alternative for calculating the SASA; (ii) the solvation models I-SOLV, EEF1 and GBobc reflect the behavior of the SASA in their solvation free energies; (iii) with the exception of EAS, all the models were able to discriminate folding from unfolding structures; (iv) no model was able to discriminate close to native structures from structures with similar compression but with high RMSD (folded incorrectly); (v) the I-SOLV and EEF1 were the solvation models that came closest to the reference model GBobc; (vi) the solvation models I-SOLV and EEF1 provided an improvement in RMSD of predicted structures in the program GAPF with respect to experimental structures; (vii) the I-SOLV and EAS have the lowest computational cost among the evaluated solvation models, being faster than GBobc. The solvation models I-SOLV and EEF1 are the best alternative among those studied to model the effects of the solvation in the protein structure prediction.
publishDate 2011
dc.date.issued.fl_str_mv 2011-12-09
dc.date.available.fl_str_mv 2013-07-08
dc.date.accessioned.fl_str_mv 2015-03-04T18:57:48Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://tede.lncc.br/handle/tede/156
url https://tede.lncc.br/handle/tede/156
dc.language.iso.fl_str_mv por
language por
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Laboratório Nacional de Computação Cientifica
dc.publisher.program.fl_str_mv Programa de Pós-Graduação em Modelagem Computacional
dc.publisher.initials.fl_str_mv LNCC
dc.publisher.country.fl_str_mv BR
dc.publisher.department.fl_str_mv Coordenação de Pós-Graduação e Aperfeiçoamento (COPGA)
publisher.none.fl_str_mv Laboratório Nacional de Computação Cientifica
dc.source.none.fl_str_mv reponame:Biblioteca Digital de Teses e Dissertações do LNCC
instname:Laboratório Nacional de Computação Científica (LNCC)
instacron:LNCC
instname_str Laboratório Nacional de Computação Científica (LNCC)
instacron_str LNCC
institution LNCC
reponame_str Biblioteca Digital de Teses e Dissertações do LNCC
collection Biblioteca Digital de Teses e Dissertações do LNCC
bitstream.url.fl_str_mv http://tede-server.lncc.br:8080/tede/bitstream/tede/156/1/Dissertacao_Corrigida_Final_Gregorio_2011.pdf
http://tede-server.lncc.br:8080/tede/bitstream/tede/156/2/Dissertacao_Corrigida_Final_Gregorio_2011.pdf.jpg
bitstream.checksum.fl_str_mv cb82d524114ee5f3199d1239571fb2cd
809ca746ac13e403b54c5c6decc6be63
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
repository.name.fl_str_mv Biblioteca Digital de Teses e Dissertações do LNCC - Laboratório Nacional de Computação Científica (LNCC)
repository.mail.fl_str_mv library@lncc.br||library@lncc.br
_version_ 1797683217854103552