Assessing the efficiency of multiple sequence alignment programs.

Detalhes bibliográficos
Autor(a) principal: Pais, Fabiano Sviatopolk Mirsky
Data de Publicação: 2014
Outros Autores: Ruy, Patrícia de Cássia, Oliveira, Guilherme Corrêa de, Coimbra, Roney Santos
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Institucional da FIOCRUZ (ARCA)
Texto Completo: https://www.arca.fiocruz.br/handle/icict/9427
Resumo: Fundação Oswaldo Cruz. Centro de Pesquisas René Rachou. Centro de Excelência em Bioinformática. Belo Horizonte, MG, Brazil/Fundação Oswaldo Cruz. Centro de Pesquisas René Rachou Grupo de Genômica Biologia Computacional. Belo Horizonte, MG, Brazil
id CRUZ_978019414dc223aa601053e9c1b980af
oai_identifier_str oai:www.arca.fiocruz.br:icict/9427
network_acronym_str CRUZ
network_name_str Repositório Institucional da FIOCRUZ (ARCA)
repository_id_str 2135
spelling Pais, Fabiano Sviatopolk MirskyRuy, Patrícia de CássiaOliveira, Guilherme Corrêa deCoimbra, Roney Santos2015-02-04T12:30:02Z2015-02-04T12:30:02Z2014PAIS, Fabiano Sviatopolk Mirsky et al. Assessing the efficiency of multiple sequence alignment programs. Algorithms Mol Biol, V.9, n. 1, p. 4, 2014.1748-7188https://www.arca.fiocruz.br/handle/icict/942710.1186/1748-7188-9-4.engBioMed CentralAssessing the efficiency of multiple sequence alignment programs.info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleFundação Oswaldo Cruz. Centro de Pesquisas René Rachou. Centro de Excelência em Bioinformática. Belo Horizonte, MG, Brazil/Fundação Oswaldo Cruz. Centro de Pesquisas René Rachou Grupo de Genômica Biologia Computacional. Belo Horizonte, MG, BrazilFundação Oswaldo Cruz. Centro de Pesquisas René Rachou. Centro de Excelência em Bioinformática. Belo Horizonte, MG, BrazilFundação Oswaldo Cruz. Centro de Pesquisas René Rachou. Centro de Excelência em Bioinformática. Belo Horizonte, MG, Brazil/Fundação Oswaldo Cruz. Centro de Pesquisas René Rachou Grupo de Genômica Biologia Computacional. Belo Horizonte, MG, BrazilFundação Oswaldo Cruz. Centro de Pesquisas René Rachou. Centro de Excelência em Bioinformática. Belo Horizonte, MG, Brazil/Fundação Oswaldo Cruz. Centro de Pesquisas René Rachou Grupo de Genômica Biologia Computacional. Belo Horizonte, MG, BrazilBACKGROUND: Multiple sequence alignment (MSA) is an extremely useful tool for molecular and evolutionary biology and there are several programs and algorithms available for this purpose. Although previous studies have compared the alignment accuracy of different MSA programs, their computational time and memory usage have not been systematically evaluated. Given the unprecedented amount of data produced by next generation deep sequencing platforms, and increasing demand for large-scale data analysis, it is imperative to optimize the application of software. Therefore, a balance between alignment accuracy and computational cost has become a critical indicator of the most suitable MSA program. We compared both accuracy and cost of nine popular MSA programs, namely CLUSTALW, CLUSTAL OMEGA, DIALIGN-TX, MAFFT, MUSCLE, POA, Probalign, Probcons and T-Coffee, against the benchmark alignment dataset BAliBASE and discuss the relevance of some implementations embedded in each program's algorithm. Accuracy of alignment was calculated with the two standard scoring functions provided by BAliBASE, the sum-of-pairs and total-column scores, and computational costs were determined by collecting peak memory usage and time of execution. RESULTS: Our results indicate that mostly the consistency-based programs Probcons, T-Coffee, Probalign and MAFFT outperformed the other programs in accuracy. Whenever sequences with large N/C terminal extensions were present in the BAliBASE suite, Probalign, MAFFT and also CLUSTAL OMEGA outperformed Probcons and T-Coffee. The drawback of these programs is that they are more memory-greedy and slower than POA, CLUSTALW, DIALIGN-TX, and MUSCLE. CLUSTALW and MUSCLE were the fastest programs, being CLUSTALW the least RAM memory demanding program. CONCLUSIONS: Based on the results presented herein, all four programs Probcons, T-Coffee, Probalign and MAFFT are well recommended for better accuracy of multiple sequence alignments. T-Coffee and recent versions of MAFFT can deliver faster and reliable alignments, which are specially suited for larger datasets than those encountered in the BAliBASE suite, if multi-core computers are available. In fact, parallelization of alignmenMultiple sequence alignmentComputer programsAccuracyPerformanceinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da FIOCRUZ (ARCA)instname:Fundação Oswaldo Cruz (FIOCRUZ)instacron:FIOCRUZLICENSElicense.txtlicense.txttext/plain; charset=utf-81914https://www.arca.fiocruz.br/bitstream/icict/9427/1/license.txt7d48279ffeed55da8dfe2f8e81f3b81fMD51ORIGINAL2014_035.pdf2014_035.pdfapplication/pdf516742https://www.arca.fiocruz.br/bitstream/icict/9427/2/2014_035.pdf2fa775121df1fd4b23c2255d7c7684fcMD52TEXT2014_035.pdf.txt2014_035.pdf.txtExtracted texttext/plain39848https://www.arca.fiocruz.br/bitstream/icict/9427/3/2014_035.pdf.txtecb192f8a1d6cff6f4cc99b1a9dacfdfMD53icict/94272019-06-19 10:07:37.034oai:www.arca.fiocruz.br:icict/9427TElDRU7Dh0EgREUgRElTVFJJQlVJw4fDg08gTsODTy1FWENMVVNJVkEKCkFvIGNvbmNvcmRhciBlIGFjZWl0YXIgZXN0YSBsaWNlbsOnYSB2b2PDqiAoYXV0b3Igb3UgZGV0ZW50b3IgZG9zIGRpcmVpdG9zIGF1dG9yYWlzKToKCmEpIERlY2xhcmEgcXVlIGNvbmhlY2UgYSBwb2zDrXRpY2EgZGUgY29weXJpZ2h0IGRhIGVkaXRvcmEgZG8gc2V1IGRvY3VtZW50by4KCmIpIERlY2xhcmEgcXVlIGNvbmhlY2UgZSBhY2VpdGEgYXMgRGlyZXRyaXplcyBwYXJhIG8gUmVwb3NpdMOzcmlvIEluc3RpdHVjaW9uYWwgZGEgRnVuZGHDp8OjbyBPc3dhbGRvIENydXogKEZJT0NSVVopLgoKYykgQ29uY2VkZSDDoCBGSU9DUlVaIG8gZGlyZWl0byBuw6NvLWV4Y2x1c2l2byBkZSBhcnF1aXZhciwgcmVwcm9kdXppciwgY29udmVydGVyIChjb21vIGRlZmluaWRvIGEgc2VndWlyKSwgY29tdW5pY2FyCiAKZS9vdSBkaXN0cmlidWlyIG5vIFJlcG9zaXTDs3JpbyBkYSBGSU9DUlVaLCBvIGRvY3VtZW50byBlbnRyZWd1ZSAoaW5jbHVpbmRvIG8gcmVzdW1vL2Fic3RyYWN0KSBlbSBmb3JtYXRvIGRpZ2l0YWwgb3UgCgpwb3IgcXVhbHF1ZXIgb3V0cm8gbWVpby4KCmQpIERlY2xhcmEgcXVlIGF1dG9yaXphIGEgRklPQ1JVWiBhIGFycXVpdmFyIG1haXMgZGUgdW1hIGPDs3BpYSBkZXN0ZSBkb2N1bWVudG8gZSBjb252ZXJ0w6otbG8sIHNlbSBhbHRlcmFyIG8gc2V1IGNvbnRlw7pkbywgCgpwYXJhIHF1YWxxdWVyIGZvcm1hdG8gZGUgYXJxdWl2bywgbWVpbyBvdSBzdXBvcnRlLCBwYXJhIGVmZWl0b3MgZGUgc2VndXJhbsOnYSwgcHJlc2VydmHDp8OjbyAoYmFja3VwKSBlIGFjZXNzby4KCmUpIERlY2xhcmEgcXVlIG8gZG9jdW1lbnRvIHN1Ym1ldGlkbyDDqSBvIHNldSB0cmFiYWxobyBvcmlnaW5hbCwgZSBxdWUgZGV0w6ltIG8gZGlyZWl0byBkZSBjb25jZWRlciBhIHRlcmNlaXJvcyBvcyBkaXJlaXRvcyAKCmNvbnRpZG9zIG5lc3RhIGxpY2Vuw6dhLiBEZWNsYXJhIHRhbWLDqW0gcXVlIGEgZW50cmVnYSBkbyBkb2N1bWVudG8gbsOjbyBpbmZyaW5nZSBvcyBkaXJlaXRvcyBkZSBxdWFscXVlciBvdXRyYSBwZXNzb2Egb3UgZW50aWRhZGUuCgpmKSBEZWNsYXJhIHF1ZSwgbm8gY2FzbyBkbyBkb2N1bWVudG8gc3VibWV0aWRvIGNvbnRlciBtYXRlcmlhbCBkbyBxdWFsIG7Do28gZGV0w6ltIG9zIGRpcmVpdG9zIGRlIGF1dG9yLCBvYnRldmUgYSBhdXRvcml6YcOnw6NvIAoKaXJyZXN0cml0YSBkbyByZXNwZWN0aXZvIGRldGVudG9yIGRlc3NlcyBkaXJlaXRvcywgcGFyYSBjZWRlciBhIEZJT0NSVVogb3MgZGlyZWl0b3MgcmVxdWVyaWRvcyBwb3IgZXN0YSBMaWNlbsOnYSBlIGF1dG9yaXphciBhIAoKdXRpbGl6w6EtbG9zIGxlZ2FsbWVudGUuIERlY2xhcmEgdGFtYsOpbSBxdWUgZXNzZSBtYXRlcmlhbCBjdWpvcyBkaXJlaXRvcyBzw6NvIGRlIHRlcmNlaXJvcyBlc3TDoSBjbGFyYW1lbnRlIGlkZW50aWZpY2FkbyBlIHJlY29uaGVjaWRvIAoKbm8gdGV4dG8gb3UgY29udGXDumRvIGRvIGRvY3VtZW50byBlbnRyZWd1ZS4KCmcpIFNFIE8gRE9DVU1FTlRPIEVOVFJFR1VFIMOJIEJBU0VBRE8gRU0gVFJBQkFMSE8gRklOQU5DSUFETyBPVSBBUE9JQURPIFBPUiBPVVRSQSBJTlNUSVRVScOHw4NPIFFVRSBOw4NPIEEgRklPQ1JVWiwgREVDTEFSQSBRVUUgQ1VNUFJJVSAKClFVQUlTUVVFUiBPQlJJR0HDh8OVRVMgRVhJR0lEQVMgUEVMTyBSRVNQRUNUSVZPIENPTlRSQVRPIE9VIEFDT1JETy4gQSBGSU9DUlVaIGlkZW50aWZpY2Fyw6EgY2xhcmFtZW50ZSBvKHMpIG5vbWUocykgZG8ocykgYXV0b3IoZXMpIGRvcyAKCmRpcmVpdG9zIGRvIGRvY3VtZW50byBlbnRyZWd1ZSBlIG7Do28gZmFyw6EgcXVhbHF1ZXIgYWx0ZXJhw6fDo28sIHBhcmEgYWzDqW0gZG8gcHJldmlzdG8gbmEgYWzDrW5lYSBjKS4KRepositório InstitucionalPUBhttps://www.arca.fiocruz.br/oai/requestrepositorio.arca@fiocruz.bropendoar:21352019-06-19T13:07:37Repositório Institucional da FIOCRUZ (ARCA) - Fundação Oswaldo Cruz (FIOCRUZ)false
dc.title.pt_BR.fl_str_mv Assessing the efficiency of multiple sequence alignment programs.
title Assessing the efficiency of multiple sequence alignment programs.
spellingShingle Assessing the efficiency of multiple sequence alignment programs.
Pais, Fabiano Sviatopolk Mirsky
Multiple sequence alignment
Computer programs
Accuracy
Performance
title_short Assessing the efficiency of multiple sequence alignment programs.
title_full Assessing the efficiency of multiple sequence alignment programs.
title_fullStr Assessing the efficiency of multiple sequence alignment programs.
title_full_unstemmed Assessing the efficiency of multiple sequence alignment programs.
title_sort Assessing the efficiency of multiple sequence alignment programs.
author Pais, Fabiano Sviatopolk Mirsky
author_facet Pais, Fabiano Sviatopolk Mirsky
Ruy, Patrícia de Cássia
Oliveira, Guilherme Corrêa de
Coimbra, Roney Santos
author_role author
author2 Ruy, Patrícia de Cássia
Oliveira, Guilherme Corrêa de
Coimbra, Roney Santos
author2_role author
author
author
dc.contributor.author.fl_str_mv Pais, Fabiano Sviatopolk Mirsky
Ruy, Patrícia de Cássia
Oliveira, Guilherme Corrêa de
Coimbra, Roney Santos
dc.subject.en.pt_BR.fl_str_mv Multiple sequence alignment
Computer programs
Accuracy
Performance
topic Multiple sequence alignment
Computer programs
Accuracy
Performance
description Fundação Oswaldo Cruz. Centro de Pesquisas René Rachou. Centro de Excelência em Bioinformática. Belo Horizonte, MG, Brazil/Fundação Oswaldo Cruz. Centro de Pesquisas René Rachou Grupo de Genômica Biologia Computacional. Belo Horizonte, MG, Brazil
publishDate 2014
dc.date.issued.fl_str_mv 2014
dc.date.accessioned.fl_str_mv 2015-02-04T12:30:02Z
dc.date.available.fl_str_mv 2015-02-04T12:30:02Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.citation.fl_str_mv PAIS, Fabiano Sviatopolk Mirsky et al. Assessing the efficiency of multiple sequence alignment programs. Algorithms Mol Biol, V.9, n. 1, p. 4, 2014.
dc.identifier.uri.fl_str_mv https://www.arca.fiocruz.br/handle/icict/9427
dc.identifier.issn.none.fl_str_mv 1748-7188
dc.identifier.doi.none.fl_str_mv 10.1186/1748-7188-9-4.
identifier_str_mv PAIS, Fabiano Sviatopolk Mirsky et al. Assessing the efficiency of multiple sequence alignment programs. Algorithms Mol Biol, V.9, n. 1, p. 4, 2014.
1748-7188
10.1186/1748-7188-9-4.
url https://www.arca.fiocruz.br/handle/icict/9427
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv BioMed Central
publisher.none.fl_str_mv BioMed Central
dc.source.none.fl_str_mv reponame:Repositório Institucional da FIOCRUZ (ARCA)
instname:Fundação Oswaldo Cruz (FIOCRUZ)
instacron:FIOCRUZ
instname_str Fundação Oswaldo Cruz (FIOCRUZ)
instacron_str FIOCRUZ
institution FIOCRUZ
reponame_str Repositório Institucional da FIOCRUZ (ARCA)
collection Repositório Institucional da FIOCRUZ (ARCA)
bitstream.url.fl_str_mv https://www.arca.fiocruz.br/bitstream/icict/9427/1/license.txt
https://www.arca.fiocruz.br/bitstream/icict/9427/2/2014_035.pdf
https://www.arca.fiocruz.br/bitstream/icict/9427/3/2014_035.pdf.txt
bitstream.checksum.fl_str_mv 7d48279ffeed55da8dfe2f8e81f3b81f
2fa775121df1fd4b23c2255d7c7684fc
ecb192f8a1d6cff6f4cc99b1a9dacfdf
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
repository.name.fl_str_mv Repositório Institucional da FIOCRUZ (ARCA) - Fundação Oswaldo Cruz (FIOCRUZ)
repository.mail.fl_str_mv repositorio.arca@fiocruz.br
_version_ 1813009185038663680