Assessing the efficiency of multiple sequence alignment programs.
Autor(a) principal: | |
---|---|
Data de Publicação: | 2014 |
Outros Autores: | , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Institucional da FIOCRUZ (ARCA) |
Texto Completo: | https://www.arca.fiocruz.br/handle/icict/9427 |
Resumo: | Fundação Oswaldo Cruz. Centro de Pesquisas René Rachou. Centro de Excelência em Bioinformática. Belo Horizonte, MG, Brazil/Fundação Oswaldo Cruz. Centro de Pesquisas René Rachou Grupo de Genômica Biologia Computacional. Belo Horizonte, MG, Brazil |
id |
CRUZ_978019414dc223aa601053e9c1b980af |
---|---|
oai_identifier_str |
oai:www.arca.fiocruz.br:icict/9427 |
network_acronym_str |
CRUZ |
network_name_str |
Repositório Institucional da FIOCRUZ (ARCA) |
repository_id_str |
2135 |
spelling |
Pais, Fabiano Sviatopolk MirskyRuy, Patrícia de CássiaOliveira, Guilherme Corrêa deCoimbra, Roney Santos2015-02-04T12:30:02Z2015-02-04T12:30:02Z2014PAIS, Fabiano Sviatopolk Mirsky et al. Assessing the efficiency of multiple sequence alignment programs. Algorithms Mol Biol, V.9, n. 1, p. 4, 2014.1748-7188https://www.arca.fiocruz.br/handle/icict/942710.1186/1748-7188-9-4.engBioMed CentralAssessing the efficiency of multiple sequence alignment programs.info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleFundação Oswaldo Cruz. Centro de Pesquisas René Rachou. Centro de Excelência em Bioinformática. Belo Horizonte, MG, Brazil/Fundação Oswaldo Cruz. Centro de Pesquisas René Rachou Grupo de Genômica Biologia Computacional. Belo Horizonte, MG, BrazilFundação Oswaldo Cruz. Centro de Pesquisas René Rachou. Centro de Excelência em Bioinformática. Belo Horizonte, MG, BrazilFundação Oswaldo Cruz. Centro de Pesquisas René Rachou. Centro de Excelência em Bioinformática. Belo Horizonte, MG, Brazil/Fundação Oswaldo Cruz. Centro de Pesquisas René Rachou Grupo de Genômica Biologia Computacional. Belo Horizonte, MG, BrazilFundação Oswaldo Cruz. Centro de Pesquisas René Rachou. Centro de Excelência em Bioinformática. Belo Horizonte, MG, Brazil/Fundação Oswaldo Cruz. Centro de Pesquisas René Rachou Grupo de Genômica Biologia Computacional. Belo Horizonte, MG, BrazilBACKGROUND: Multiple sequence alignment (MSA) is an extremely useful tool for molecular and evolutionary biology and there are several programs and algorithms available for this purpose. Although previous studies have compared the alignment accuracy of different MSA programs, their computational time and memory usage have not been systematically evaluated. Given the unprecedented amount of data produced by next generation deep sequencing platforms, and increasing demand for large-scale data analysis, it is imperative to optimize the application of software. Therefore, a balance between alignment accuracy and computational cost has become a critical indicator of the most suitable MSA program. We compared both accuracy and cost of nine popular MSA programs, namely CLUSTALW, CLUSTAL OMEGA, DIALIGN-TX, MAFFT, MUSCLE, POA, Probalign, Probcons and T-Coffee, against the benchmark alignment dataset BAliBASE and discuss the relevance of some implementations embedded in each program's algorithm. Accuracy of alignment was calculated with the two standard scoring functions provided by BAliBASE, the sum-of-pairs and total-column scores, and computational costs were determined by collecting peak memory usage and time of execution. RESULTS: Our results indicate that mostly the consistency-based programs Probcons, T-Coffee, Probalign and MAFFT outperformed the other programs in accuracy. Whenever sequences with large N/C terminal extensions were present in the BAliBASE suite, Probalign, MAFFT and also CLUSTAL OMEGA outperformed Probcons and T-Coffee. The drawback of these programs is that they are more memory-greedy and slower than POA, CLUSTALW, DIALIGN-TX, and MUSCLE. CLUSTALW and MUSCLE were the fastest programs, being CLUSTALW the least RAM memory demanding program. CONCLUSIONS: Based on the results presented herein, all four programs Probcons, T-Coffee, Probalign and MAFFT are well recommended for better accuracy of multiple sequence alignments. T-Coffee and recent versions of MAFFT can deliver faster and reliable alignments, which are specially suited for larger datasets than those encountered in the BAliBASE suite, if multi-core computers are available. In fact, parallelization of alignmenMultiple sequence alignmentComputer programsAccuracyPerformanceinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da FIOCRUZ (ARCA)instname:Fundação Oswaldo Cruz (FIOCRUZ)instacron:FIOCRUZLICENSElicense.txtlicense.txttext/plain; charset=utf-81914https://www.arca.fiocruz.br/bitstream/icict/9427/1/license.txt7d48279ffeed55da8dfe2f8e81f3b81fMD51ORIGINAL2014_035.pdf2014_035.pdfapplication/pdf516742https://www.arca.fiocruz.br/bitstream/icict/9427/2/2014_035.pdf2fa775121df1fd4b23c2255d7c7684fcMD52TEXT2014_035.pdf.txt2014_035.pdf.txtExtracted texttext/plain39848https://www.arca.fiocruz.br/bitstream/icict/9427/3/2014_035.pdf.txtecb192f8a1d6cff6f4cc99b1a9dacfdfMD53icict/94272019-06-19 10:07:37.034oai:www.arca.fiocruz.br:icict/9427TElDRU7Dh0EgREUgRElTVFJJQlVJw4fDg08gTsODTy1FWENMVVNJVkEKCkFvIGNvbmNvcmRhciBlIGFjZWl0YXIgZXN0YSBsaWNlbsOnYSB2b2PDqiAoYXV0b3Igb3UgZGV0ZW50b3IgZG9zIGRpcmVpdG9zIGF1dG9yYWlzKToKCmEpIERlY2xhcmEgcXVlIGNvbmhlY2UgYSBwb2zDrXRpY2EgZGUgY29weXJpZ2h0IGRhIGVkaXRvcmEgZG8gc2V1IGRvY3VtZW50by4KCmIpIERlY2xhcmEgcXVlIGNvbmhlY2UgZSBhY2VpdGEgYXMgRGlyZXRyaXplcyBwYXJhIG8gUmVwb3NpdMOzcmlvIEluc3RpdHVjaW9uYWwgZGEgRnVuZGHDp8OjbyBPc3dhbGRvIENydXogKEZJT0NSVVopLgoKYykgQ29uY2VkZSDDoCBGSU9DUlVaIG8gZGlyZWl0byBuw6NvLWV4Y2x1c2l2byBkZSBhcnF1aXZhciwgcmVwcm9kdXppciwgY29udmVydGVyIChjb21vIGRlZmluaWRvIGEgc2VndWlyKSwgY29tdW5pY2FyCiAKZS9vdSBkaXN0cmlidWlyIG5vIFJlcG9zaXTDs3JpbyBkYSBGSU9DUlVaLCBvIGRvY3VtZW50byBlbnRyZWd1ZSAoaW5jbHVpbmRvIG8gcmVzdW1vL2Fic3RyYWN0KSBlbSBmb3JtYXRvIGRpZ2l0YWwgb3UgCgpwb3IgcXVhbHF1ZXIgb3V0cm8gbWVpby4KCmQpIERlY2xhcmEgcXVlIGF1dG9yaXphIGEgRklPQ1JVWiBhIGFycXVpdmFyIG1haXMgZGUgdW1hIGPDs3BpYSBkZXN0ZSBkb2N1bWVudG8gZSBjb252ZXJ0w6otbG8sIHNlbSBhbHRlcmFyIG8gc2V1IGNvbnRlw7pkbywgCgpwYXJhIHF1YWxxdWVyIGZvcm1hdG8gZGUgYXJxdWl2bywgbWVpbyBvdSBzdXBvcnRlLCBwYXJhIGVmZWl0b3MgZGUgc2VndXJhbsOnYSwgcHJlc2VydmHDp8OjbyAoYmFja3VwKSBlIGFjZXNzby4KCmUpIERlY2xhcmEgcXVlIG8gZG9jdW1lbnRvIHN1Ym1ldGlkbyDDqSBvIHNldSB0cmFiYWxobyBvcmlnaW5hbCwgZSBxdWUgZGV0w6ltIG8gZGlyZWl0byBkZSBjb25jZWRlciBhIHRlcmNlaXJvcyBvcyBkaXJlaXRvcyAKCmNvbnRpZG9zIG5lc3RhIGxpY2Vuw6dhLiBEZWNsYXJhIHRhbWLDqW0gcXVlIGEgZW50cmVnYSBkbyBkb2N1bWVudG8gbsOjbyBpbmZyaW5nZSBvcyBkaXJlaXRvcyBkZSBxdWFscXVlciBvdXRyYSBwZXNzb2Egb3UgZW50aWRhZGUuCgpmKSBEZWNsYXJhIHF1ZSwgbm8gY2FzbyBkbyBkb2N1bWVudG8gc3VibWV0aWRvIGNvbnRlciBtYXRlcmlhbCBkbyBxdWFsIG7Do28gZGV0w6ltIG9zIGRpcmVpdG9zIGRlIGF1dG9yLCBvYnRldmUgYSBhdXRvcml6YcOnw6NvIAoKaXJyZXN0cml0YSBkbyByZXNwZWN0aXZvIGRldGVudG9yIGRlc3NlcyBkaXJlaXRvcywgcGFyYSBjZWRlciBhIEZJT0NSVVogb3MgZGlyZWl0b3MgcmVxdWVyaWRvcyBwb3IgZXN0YSBMaWNlbsOnYSBlIGF1dG9yaXphciBhIAoKdXRpbGl6w6EtbG9zIGxlZ2FsbWVudGUuIERlY2xhcmEgdGFtYsOpbSBxdWUgZXNzZSBtYXRlcmlhbCBjdWpvcyBkaXJlaXRvcyBzw6NvIGRlIHRlcmNlaXJvcyBlc3TDoSBjbGFyYW1lbnRlIGlkZW50aWZpY2FkbyBlIHJlY29uaGVjaWRvIAoKbm8gdGV4dG8gb3UgY29udGXDumRvIGRvIGRvY3VtZW50byBlbnRyZWd1ZS4KCmcpIFNFIE8gRE9DVU1FTlRPIEVOVFJFR1VFIMOJIEJBU0VBRE8gRU0gVFJBQkFMSE8gRklOQU5DSUFETyBPVSBBUE9JQURPIFBPUiBPVVRSQSBJTlNUSVRVScOHw4NPIFFVRSBOw4NPIEEgRklPQ1JVWiwgREVDTEFSQSBRVUUgQ1VNUFJJVSAKClFVQUlTUVVFUiBPQlJJR0HDh8OVRVMgRVhJR0lEQVMgUEVMTyBSRVNQRUNUSVZPIENPTlRSQVRPIE9VIEFDT1JETy4gQSBGSU9DUlVaIGlkZW50aWZpY2Fyw6EgY2xhcmFtZW50ZSBvKHMpIG5vbWUocykgZG8ocykgYXV0b3IoZXMpIGRvcyAKCmRpcmVpdG9zIGRvIGRvY3VtZW50byBlbnRyZWd1ZSBlIG7Do28gZmFyw6EgcXVhbHF1ZXIgYWx0ZXJhw6fDo28sIHBhcmEgYWzDqW0gZG8gcHJldmlzdG8gbmEgYWzDrW5lYSBjKS4KRepositório InstitucionalPUBhttps://www.arca.fiocruz.br/oai/requestrepositorio.arca@fiocruz.bropendoar:21352019-06-19T13:07:37Repositório Institucional da FIOCRUZ (ARCA) - Fundação Oswaldo Cruz (FIOCRUZ)false |
dc.title.pt_BR.fl_str_mv |
Assessing the efficiency of multiple sequence alignment programs. |
title |
Assessing the efficiency of multiple sequence alignment programs. |
spellingShingle |
Assessing the efficiency of multiple sequence alignment programs. Pais, Fabiano Sviatopolk Mirsky Multiple sequence alignment Computer programs Accuracy Performance |
title_short |
Assessing the efficiency of multiple sequence alignment programs. |
title_full |
Assessing the efficiency of multiple sequence alignment programs. |
title_fullStr |
Assessing the efficiency of multiple sequence alignment programs. |
title_full_unstemmed |
Assessing the efficiency of multiple sequence alignment programs. |
title_sort |
Assessing the efficiency of multiple sequence alignment programs. |
author |
Pais, Fabiano Sviatopolk Mirsky |
author_facet |
Pais, Fabiano Sviatopolk Mirsky Ruy, Patrícia de Cássia Oliveira, Guilherme Corrêa de Coimbra, Roney Santos |
author_role |
author |
author2 |
Ruy, Patrícia de Cássia Oliveira, Guilherme Corrêa de Coimbra, Roney Santos |
author2_role |
author author author |
dc.contributor.author.fl_str_mv |
Pais, Fabiano Sviatopolk Mirsky Ruy, Patrícia de Cássia Oliveira, Guilherme Corrêa de Coimbra, Roney Santos |
dc.subject.en.pt_BR.fl_str_mv |
Multiple sequence alignment Computer programs Accuracy Performance |
topic |
Multiple sequence alignment Computer programs Accuracy Performance |
description |
Fundação Oswaldo Cruz. Centro de Pesquisas René Rachou. Centro de Excelência em Bioinformática. Belo Horizonte, MG, Brazil/Fundação Oswaldo Cruz. Centro de Pesquisas René Rachou Grupo de Genômica Biologia Computacional. Belo Horizonte, MG, Brazil |
publishDate |
2014 |
dc.date.issued.fl_str_mv |
2014 |
dc.date.accessioned.fl_str_mv |
2015-02-04T12:30:02Z |
dc.date.available.fl_str_mv |
2015-02-04T12:30:02Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.citation.fl_str_mv |
PAIS, Fabiano Sviatopolk Mirsky et al. Assessing the efficiency of multiple sequence alignment programs. Algorithms Mol Biol, V.9, n. 1, p. 4, 2014. |
dc.identifier.uri.fl_str_mv |
https://www.arca.fiocruz.br/handle/icict/9427 |
dc.identifier.issn.none.fl_str_mv |
1748-7188 |
dc.identifier.doi.none.fl_str_mv |
10.1186/1748-7188-9-4. |
identifier_str_mv |
PAIS, Fabiano Sviatopolk Mirsky et al. Assessing the efficiency of multiple sequence alignment programs. Algorithms Mol Biol, V.9, n. 1, p. 4, 2014. 1748-7188 10.1186/1748-7188-9-4. |
url |
https://www.arca.fiocruz.br/handle/icict/9427 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.publisher.none.fl_str_mv |
BioMed Central |
publisher.none.fl_str_mv |
BioMed Central |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da FIOCRUZ (ARCA) instname:Fundação Oswaldo Cruz (FIOCRUZ) instacron:FIOCRUZ |
instname_str |
Fundação Oswaldo Cruz (FIOCRUZ) |
instacron_str |
FIOCRUZ |
institution |
FIOCRUZ |
reponame_str |
Repositório Institucional da FIOCRUZ (ARCA) |
collection |
Repositório Institucional da FIOCRUZ (ARCA) |
bitstream.url.fl_str_mv |
https://www.arca.fiocruz.br/bitstream/icict/9427/1/license.txt https://www.arca.fiocruz.br/bitstream/icict/9427/2/2014_035.pdf https://www.arca.fiocruz.br/bitstream/icict/9427/3/2014_035.pdf.txt |
bitstream.checksum.fl_str_mv |
7d48279ffeed55da8dfe2f8e81f3b81f 2fa775121df1fd4b23c2255d7c7684fc ecb192f8a1d6cff6f4cc99b1a9dacfdf |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 |
repository.name.fl_str_mv |
Repositório Institucional da FIOCRUZ (ARCA) - Fundação Oswaldo Cruz (FIOCRUZ) |
repository.mail.fl_str_mv |
repositorio.arca@fiocruz.br |
_version_ |
1813009185038663680 |