LEMPEL-ZIV SLIDING WINDOW UPDATE WITH SUFFIX ARRAYS
Autor(a) principal: | |
---|---|
Data de Publicação: | 2013 |
Outros Autores: | , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | https://doi.org/10.34629/ipl.isel.i-ETC.6 |
Resumo: | The sliding window dictionary-based algorithms of the Lempel-Ziv (LZ) 77 family are widely used for universal lossless data compression. The encoding component of these algorithms performs repeated substring search. Data structures, such as hash tables, binary search trees, and suffix trees have been used to speedup these searches, at the expense of memory usage. Previous work has shown how suffix arrays (SA) can be used for dictionary representation and LZ77 decomposition. In this paper, we improve over that work by proposing a new efficient algorithm to update the sliding window each time a token is produced at the output. The proposed algorithm toggles between two SA on consecutive tokens. The resulting SA-based encoder requires less memory than the conventional tree-based encoders. In comparing our SA-based technique against tree-based encoders, on a large set of benchmark files, we find that, in some compression settings, our encoder is also faster than tree-based encoders. |
id |
RCAP_32d42360faa39c6257ba5c64ba48690b |
---|---|
oai_identifier_str |
oai:i-ETC.journals.isel.pt:article/6 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
LEMPEL-ZIV SLIDING WINDOW UPDATE WITH SUFFIX ARRAYSComputers; Data Compression; Data StructuresLempel-Ziv compression; suffix arrays; sliding window update; substring searchThe sliding window dictionary-based algorithms of the Lempel-Ziv (LZ) 77 family are widely used for universal lossless data compression. The encoding component of these algorithms performs repeated substring search. Data structures, such as hash tables, binary search trees, and suffix trees have been used to speedup these searches, at the expense of memory usage. Previous work has shown how suffix arrays (SA) can be used for dictionary representation and LZ77 decomposition. In this paper, we improve over that work by proposing a new efficient algorithm to update the sliding window each time a token is produced at the output. The proposed algorithm toggles between two SA on consecutive tokens. The resulting SA-based encoder requires less memory than the conventional tree-based encoders. In comparing our SA-based technique against tree-based encoders, on a large set of benchmark files, we find that, in some compression settings, our encoder is also faster than tree-based encoders.ISEL - High Institute of Engineering of Lisbon2013-06-27T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://doi.org/10.34629/ipl.isel.i-ETC.6oai:i-ETC.journals.isel.pt:article/6i-ETC : ISEL Academic Journal of Electronics Telecommunications and Computers; Vol 2, No 1 (2013): The CETC2011 Issue; ID-4i-ETC : ISEL Academic Journal of Electronics Telecommunications and Computers; Vol 2, No 1 (2013): The CETC2011 Issue; ID-42182-4010reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAPenghttp://journals.isel.pt/index.php/i-ETC/article/view/6https://doi.org/10.34629/ipl.isel.i-ETC.6http://journals.isel.pt/index.php/i-ETC/article/view/6/6Copyright (c) 2013 i-ETC : ISEL Academic Journal of Electronics Telecommunications and Computershttp://creativecommons.org/licenses/by-nc/4.0info:eu-repo/semantics/openAccessFerreira, ArturOliveira, ArlindoFigueiredo, Mario2022-09-20T15:26:05Zoai:i-ETC.journals.isel.pt:article/6Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T15:51:11.016301Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
LEMPEL-ZIV SLIDING WINDOW UPDATE WITH SUFFIX ARRAYS |
title |
LEMPEL-ZIV SLIDING WINDOW UPDATE WITH SUFFIX ARRAYS |
spellingShingle |
LEMPEL-ZIV SLIDING WINDOW UPDATE WITH SUFFIX ARRAYS Ferreira, Artur Computers; Data Compression; Data Structures Lempel-Ziv compression; suffix arrays; sliding window update; substring search |
title_short |
LEMPEL-ZIV SLIDING WINDOW UPDATE WITH SUFFIX ARRAYS |
title_full |
LEMPEL-ZIV SLIDING WINDOW UPDATE WITH SUFFIX ARRAYS |
title_fullStr |
LEMPEL-ZIV SLIDING WINDOW UPDATE WITH SUFFIX ARRAYS |
title_full_unstemmed |
LEMPEL-ZIV SLIDING WINDOW UPDATE WITH SUFFIX ARRAYS |
title_sort |
LEMPEL-ZIV SLIDING WINDOW UPDATE WITH SUFFIX ARRAYS |
author |
Ferreira, Artur |
author_facet |
Ferreira, Artur Oliveira, Arlindo Figueiredo, Mario |
author_role |
author |
author2 |
Oliveira, Arlindo Figueiredo, Mario |
author2_role |
author author |
dc.contributor.author.fl_str_mv |
Ferreira, Artur Oliveira, Arlindo Figueiredo, Mario |
dc.subject.por.fl_str_mv |
Computers; Data Compression; Data Structures Lempel-Ziv compression; suffix arrays; sliding window update; substring search |
topic |
Computers; Data Compression; Data Structures Lempel-Ziv compression; suffix arrays; sliding window update; substring search |
description |
The sliding window dictionary-based algorithms of the Lempel-Ziv (LZ) 77 family are widely used for universal lossless data compression. The encoding component of these algorithms performs repeated substring search. Data structures, such as hash tables, binary search trees, and suffix trees have been used to speedup these searches, at the expense of memory usage. Previous work has shown how suffix arrays (SA) can be used for dictionary representation and LZ77 decomposition. In this paper, we improve over that work by proposing a new efficient algorithm to update the sliding window each time a token is produced at the output. The proposed algorithm toggles between two SA on consecutive tokens. The resulting SA-based encoder requires less memory than the conventional tree-based encoders. In comparing our SA-based technique against tree-based encoders, on a large set of benchmark files, we find that, in some compression settings, our encoder is also faster than tree-based encoders. |
publishDate |
2013 |
dc.date.none.fl_str_mv |
2013-06-27T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://doi.org/10.34629/ipl.isel.i-ETC.6 oai:i-ETC.journals.isel.pt:article/6 |
url |
https://doi.org/10.34629/ipl.isel.i-ETC.6 |
identifier_str_mv |
oai:i-ETC.journals.isel.pt:article/6 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
http://journals.isel.pt/index.php/i-ETC/article/view/6 https://doi.org/10.34629/ipl.isel.i-ETC.6 http://journals.isel.pt/index.php/i-ETC/article/view/6/6 |
dc.rights.driver.fl_str_mv |
Copyright (c) 2013 i-ETC : ISEL Academic Journal of Electronics Telecommunications and Computers http://creativecommons.org/licenses/by-nc/4.0 info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Copyright (c) 2013 i-ETC : ISEL Academic Journal of Electronics Telecommunications and Computers http://creativecommons.org/licenses/by-nc/4.0 |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
ISEL - High Institute of Engineering of Lisbon |
publisher.none.fl_str_mv |
ISEL - High Institute of Engineering of Lisbon |
dc.source.none.fl_str_mv |
i-ETC : ISEL Academic Journal of Electronics Telecommunications and Computers; Vol 2, No 1 (2013): The CETC2011 Issue; ID-4 i-ETC : ISEL Academic Journal of Electronics Telecommunications and Computers; Vol 2, No 1 (2013): The CETC2011 Issue; ID-4 2182-4010 reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799130375460487168 |