MAFCO: a compression tool for MAF files
Autor(a) principal: | |
---|---|
Data de Publicação: | 2015 |
Outros Autores: | , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10773/37053 |
Resumo: | In the last decade, the cost of genomic sequencing has been decreasing so much that researchers all over the world accumulate huge amounts of data for present and future use. These genomic data need to be efficiently stored, because storage cost is not decreasing as fast as the cost of sequencing. In order to overcome this problem, the most popular general-purpose compression tool, gzip, is usually used. However, these tools were not specifically designed to compress this kind of data, and often fall short when the intention is to reduce the data size as much as possible. There are several compression algorithms available, even for genomic data, but very few have been designed to deal with Whole Genome Alignments, containing alignments between entire genomes of several species. In this paper, we present a lossless compression tool, MAFCO, specifically designed to compress MAF (Multiple Alignment Format) files. Compared to gzip, the proposed tool attains a compression gain from 34% to 57%, depending on the data set. When compared to a recent dedicated method, which is not compatible with some data sets, the compression gain of MAFCO is about 9%. Both source-code and binaries for several operating systems are freely available for non-commercial use at: http://bioinformatics.ua.pt/software/mafco. |
id |
RCAP_346f04b3f13fc533da230816de8fea30 |
---|---|
oai_identifier_str |
oai:ria.ua.pt:10773/37053 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
MAFCO: a compression tool for MAF filesData compressionGenomicsTime factorsSequence alignmentIn the last decade, the cost of genomic sequencing has been decreasing so much that researchers all over the world accumulate huge amounts of data for present and future use. These genomic data need to be efficiently stored, because storage cost is not decreasing as fast as the cost of sequencing. In order to overcome this problem, the most popular general-purpose compression tool, gzip, is usually used. However, these tools were not specifically designed to compress this kind of data, and often fall short when the intention is to reduce the data size as much as possible. There are several compression algorithms available, even for genomic data, but very few have been designed to deal with Whole Genome Alignments, containing alignments between entire genomes of several species. In this paper, we present a lossless compression tool, MAFCO, specifically designed to compress MAF (Multiple Alignment Format) files. Compared to gzip, the proposed tool attains a compression gain from 34% to 57%, depending on the data set. When compared to a recent dedicated method, which is not compatible with some data sets, the compression gain of MAFCO is about 9%. Both source-code and binaries for several operating systems are freely available for non-commercial use at: http://bioinformatics.ua.pt/software/mafco.PLoS2023-04-14T14:06:18Z2015-01-01T00:00:00Z2015info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10773/37053eng10.1371/journal.pone.0116082Matos, Luís M. O.Neves, António J. R.Pratas, DiogoPinho, Armando J.info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-02-22T12:11:25Zoai:ria.ua.pt:10773/37053Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:07:42.022833Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
MAFCO: a compression tool for MAF files |
title |
MAFCO: a compression tool for MAF files |
spellingShingle |
MAFCO: a compression tool for MAF files Matos, Luís M. O. Data compression Genomics Time factors Sequence alignment |
title_short |
MAFCO: a compression tool for MAF files |
title_full |
MAFCO: a compression tool for MAF files |
title_fullStr |
MAFCO: a compression tool for MAF files |
title_full_unstemmed |
MAFCO: a compression tool for MAF files |
title_sort |
MAFCO: a compression tool for MAF files |
author |
Matos, Luís M. O. |
author_facet |
Matos, Luís M. O. Neves, António J. R. Pratas, Diogo Pinho, Armando J. |
author_role |
author |
author2 |
Neves, António J. R. Pratas, Diogo Pinho, Armando J. |
author2_role |
author author author |
dc.contributor.author.fl_str_mv |
Matos, Luís M. O. Neves, António J. R. Pratas, Diogo Pinho, Armando J. |
dc.subject.por.fl_str_mv |
Data compression Genomics Time factors Sequence alignment |
topic |
Data compression Genomics Time factors Sequence alignment |
description |
In the last decade, the cost of genomic sequencing has been decreasing so much that researchers all over the world accumulate huge amounts of data for present and future use. These genomic data need to be efficiently stored, because storage cost is not decreasing as fast as the cost of sequencing. In order to overcome this problem, the most popular general-purpose compression tool, gzip, is usually used. However, these tools were not specifically designed to compress this kind of data, and often fall short when the intention is to reduce the data size as much as possible. There are several compression algorithms available, even for genomic data, but very few have been designed to deal with Whole Genome Alignments, containing alignments between entire genomes of several species. In this paper, we present a lossless compression tool, MAFCO, specifically designed to compress MAF (Multiple Alignment Format) files. Compared to gzip, the proposed tool attains a compression gain from 34% to 57%, depending on the data set. When compared to a recent dedicated method, which is not compatible with some data sets, the compression gain of MAFCO is about 9%. Both source-code and binaries for several operating systems are freely available for non-commercial use at: http://bioinformatics.ua.pt/software/mafco. |
publishDate |
2015 |
dc.date.none.fl_str_mv |
2015-01-01T00:00:00Z 2015 2023-04-14T14:06:18Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10773/37053 |
url |
http://hdl.handle.net/10773/37053 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
10.1371/journal.pone.0116082 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
PLoS |
publisher.none.fl_str_mv |
PLoS |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799137731224272896 |