ProtASR: An Evolutionary Framework for Ancestral Protein Reconstruction with Selection on Folding Stability

Detalhes bibliográficos
Autor(a) principal: Arenas, M
Data de Publicação: 2017
Outros Autores: Weber, CC, Liberles, DA, Bastolla, U
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: https://hdl.handle.net/10216/156999
Resumo: The computational reconstruction of ancestral proteins provides information on past biological events and has practical implications for biomedicine and biotechnology. Currently available tools for ancestral sequence reconstruction (ASR) are often based on empirical amino acid substitution models that assume that all sites evolve at the same rate and under the same process. However, this assumption is frequently violated because protein evolution is highly heterogeneous due to different selective constraints among sites. Here, we present ProtASR, a new evolutionary framework to infer ancestral protein sequences accounting for selection on protein stability. First, ProtASR generates site-specific substitution matrices through the structurally constrained mean-field (MF) substitution model, which considers both unfolding and misfolding stability. We previously showed that MF models outperform empirical amino acid substitution models, as well as other structurally constrained substitution models, both in terms of likelihood and correctly inferring amino acid distributions across sites. In the second step, ProtASR adapts a well-established maximum-likelihood (ML) ASR procedure to infer ancestral proteins under MF models. A known bias of ML ASR methods is that they tend to overestimate the stability of ancestral proteins by underestimating the frequency of deleterious mutations. We compared ProtASR under MF to two empirical substitution models (JTT and CAT), reconstructing the ancestral sequences of simulated proteins. ProtASR yields reconstructed proteins with less biased stabilities, which are significantly closer to those of the simulated proteins. Analysis of extant protein families suggests that folding stability evolves through time across protein families, potentially reflecting neutral fluctuation. Some families exhibit a more constant protein folding stability, while others are more variable. ProtASR is freely available from https://github.com/miguelarenas/protasr and includes detailed documentation and ready-to-use examples. It runs in seconds/minutes depending on protein length and alignment size. [Ancestral sequence reconstruction; folding stability; molecular adaptation; phylogenetics; protein evolution; protein structure.].
id RCAP_5d7e760ba650ffb54c0b5493ffe788b7
oai_identifier_str oai:repositorio-aberto.up.pt:10216/156999
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling ProtASR: An Evolutionary Framework for Ancestral Protein Reconstruction with Selection on Folding StabilityAncestral sequence reconstructionProtein evolutionMolecular adaptationPhylogeneticsFolding stabilityProtein structureThe computational reconstruction of ancestral proteins provides information on past biological events and has practical implications for biomedicine and biotechnology. Currently available tools for ancestral sequence reconstruction (ASR) are often based on empirical amino acid substitution models that assume that all sites evolve at the same rate and under the same process. However, this assumption is frequently violated because protein evolution is highly heterogeneous due to different selective constraints among sites. Here, we present ProtASR, a new evolutionary framework to infer ancestral protein sequences accounting for selection on protein stability. First, ProtASR generates site-specific substitution matrices through the structurally constrained mean-field (MF) substitution model, which considers both unfolding and misfolding stability. We previously showed that MF models outperform empirical amino acid substitution models, as well as other structurally constrained substitution models, both in terms of likelihood and correctly inferring amino acid distributions across sites. In the second step, ProtASR adapts a well-established maximum-likelihood (ML) ASR procedure to infer ancestral proteins under MF models. A known bias of ML ASR methods is that they tend to overestimate the stability of ancestral proteins by underestimating the frequency of deleterious mutations. We compared ProtASR under MF to two empirical substitution models (JTT and CAT), reconstructing the ancestral sequences of simulated proteins. ProtASR yields reconstructed proteins with less biased stabilities, which are significantly closer to those of the simulated proteins. Analysis of extant protein families suggests that folding stability evolves through time across protein families, potentially reflecting neutral fluctuation. Some families exhibit a more constant protein folding stability, while others are more variable. ProtASR is freely available from https://github.com/miguelarenas/protasr and includes detailed documentation and ready-to-use examples. It runs in seconds/minutes depending on protein length and alignment size. [Ancestral sequence reconstruction; folding stability; molecular adaptation; phylogenetics; protein evolution; protein structure.].Society of Systematic Biologists20172017-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://hdl.handle.net/10216/156999eng1063-515710.1093/sysbio/syw121Arenas, MWeber, CCLiberles, DABastolla, Uinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-02-02T01:24:40Zoai:repositorio-aberto.up.pt:10216/156999Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T01:59:27.875125Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv ProtASR: An Evolutionary Framework for Ancestral Protein Reconstruction with Selection on Folding Stability
title ProtASR: An Evolutionary Framework for Ancestral Protein Reconstruction with Selection on Folding Stability
spellingShingle ProtASR: An Evolutionary Framework for Ancestral Protein Reconstruction with Selection on Folding Stability
Arenas, M
Ancestral sequence reconstruction
Protein evolution
Molecular adaptation
Phylogenetics
Folding stability
Protein structure
title_short ProtASR: An Evolutionary Framework for Ancestral Protein Reconstruction with Selection on Folding Stability
title_full ProtASR: An Evolutionary Framework for Ancestral Protein Reconstruction with Selection on Folding Stability
title_fullStr ProtASR: An Evolutionary Framework for Ancestral Protein Reconstruction with Selection on Folding Stability
title_full_unstemmed ProtASR: An Evolutionary Framework for Ancestral Protein Reconstruction with Selection on Folding Stability
title_sort ProtASR: An Evolutionary Framework for Ancestral Protein Reconstruction with Selection on Folding Stability
author Arenas, M
author_facet Arenas, M
Weber, CC
Liberles, DA
Bastolla, U
author_role author
author2 Weber, CC
Liberles, DA
Bastolla, U
author2_role author
author
author
dc.contributor.author.fl_str_mv Arenas, M
Weber, CC
Liberles, DA
Bastolla, U
dc.subject.por.fl_str_mv Ancestral sequence reconstruction
Protein evolution
Molecular adaptation
Phylogenetics
Folding stability
Protein structure
topic Ancestral sequence reconstruction
Protein evolution
Molecular adaptation
Phylogenetics
Folding stability
Protein structure
description The computational reconstruction of ancestral proteins provides information on past biological events and has practical implications for biomedicine and biotechnology. Currently available tools for ancestral sequence reconstruction (ASR) are often based on empirical amino acid substitution models that assume that all sites evolve at the same rate and under the same process. However, this assumption is frequently violated because protein evolution is highly heterogeneous due to different selective constraints among sites. Here, we present ProtASR, a new evolutionary framework to infer ancestral protein sequences accounting for selection on protein stability. First, ProtASR generates site-specific substitution matrices through the structurally constrained mean-field (MF) substitution model, which considers both unfolding and misfolding stability. We previously showed that MF models outperform empirical amino acid substitution models, as well as other structurally constrained substitution models, both in terms of likelihood and correctly inferring amino acid distributions across sites. In the second step, ProtASR adapts a well-established maximum-likelihood (ML) ASR procedure to infer ancestral proteins under MF models. A known bias of ML ASR methods is that they tend to overestimate the stability of ancestral proteins by underestimating the frequency of deleterious mutations. We compared ProtASR under MF to two empirical substitution models (JTT and CAT), reconstructing the ancestral sequences of simulated proteins. ProtASR yields reconstructed proteins with less biased stabilities, which are significantly closer to those of the simulated proteins. Analysis of extant protein families suggests that folding stability evolves through time across protein families, potentially reflecting neutral fluctuation. Some families exhibit a more constant protein folding stability, while others are more variable. ProtASR is freely available from https://github.com/miguelarenas/protasr and includes detailed documentation and ready-to-use examples. It runs in seconds/minutes depending on protein length and alignment size. [Ancestral sequence reconstruction; folding stability; molecular adaptation; phylogenetics; protein evolution; protein structure.].
publishDate 2017
dc.date.none.fl_str_mv 2017
2017-01-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://hdl.handle.net/10216/156999
url https://hdl.handle.net/10216/156999
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 1063-5157
10.1093/sysbio/syw121
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Society of Systematic Biologists
publisher.none.fl_str_mv Society of Systematic Biologists
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799137078361980928