LEMPEL-ZIV SLIDING WINDOW UPDATE WITH SUFFIX ARRAYS

Detalhes bibliográficos
Autor(a) principal: Ferreira, Artur
Data de Publicação: 2013
Outros Autores: Oliveira, Arlindo, Figueiredo, Mario
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: https://doi.org/10.34629/ipl.isel.i-ETC.6
Resumo: The sliding window dictionary-based algorithms of the Lempel-Ziv (LZ) 77 family are widely used for universal lossless data compression. The encoding component of these algorithms performs repeated substring search. Data structures, such as hash tables, binary search trees, and suffix trees have been used to speedup these searches, at the expense of memory usage. Previous work has shown how suffix arrays (SA) can be used for dictionary representation and LZ77 decomposition. In this paper, we improve over that work by proposing a new efficient algorithm to update the sliding window each time a token is produced at the output. The proposed algorithm toggles between two SA on consecutive tokens. The resulting SA-based encoder requires less memory than the conventional tree-based encoders. In comparing our SA-based technique against tree-based encoders, on a large set of benchmark files, we find that, in some compression settings, our encoder is also faster than tree-based encoders.
id RCAP_32d42360faa39c6257ba5c64ba48690b
oai_identifier_str oai:i-ETC.journals.isel.pt:article/6
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling LEMPEL-ZIV SLIDING WINDOW UPDATE WITH SUFFIX ARRAYSComputers; Data Compression; Data StructuresLempel-Ziv compression; suffix arrays; sliding window update; substring searchThe sliding window dictionary-based algorithms of the Lempel-Ziv (LZ) 77 family are widely used for universal lossless data compression. The encoding component of these algorithms performs repeated substring search. Data structures, such as hash tables, binary search trees, and suffix trees have been used to speedup these searches, at the expense of memory usage. Previous work has shown how suffix arrays (SA) can be used for dictionary representation and LZ77 decomposition. In this paper, we improve over that work by proposing a new efficient algorithm to update the sliding window each time a token is produced at the output. The proposed algorithm toggles between two SA on consecutive tokens. The resulting SA-based encoder requires less memory than the conventional tree-based encoders. In comparing our SA-based technique against tree-based encoders, on a large set of benchmark files, we find that, in some compression settings, our encoder is also faster than tree-based encoders.ISEL - High Institute of Engineering of Lisbon2013-06-27T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://doi.org/10.34629/ipl.isel.i-ETC.6oai:i-ETC.journals.isel.pt:article/6i-ETC : ISEL Academic Journal of Electronics Telecommunications and Computers; Vol 2, No 1 (2013): The CETC2011 Issue; ID-4i-ETC : ISEL Academic Journal of Electronics Telecommunications and Computers; Vol 2, No 1 (2013): The CETC2011 Issue; ID-42182-4010reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAPenghttp://journals.isel.pt/index.php/i-ETC/article/view/6https://doi.org/10.34629/ipl.isel.i-ETC.6http://journals.isel.pt/index.php/i-ETC/article/view/6/6Copyright (c) 2013 i-ETC : ISEL Academic Journal of Electronics Telecommunications and Computershttp://creativecommons.org/licenses/by-nc/4.0info:eu-repo/semantics/openAccessFerreira, ArturOliveira, ArlindoFigueiredo, Mario2022-09-20T15:26:05Zoai:i-ETC.journals.isel.pt:article/6Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T15:51:11.016301Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv LEMPEL-ZIV SLIDING WINDOW UPDATE WITH SUFFIX ARRAYS
title LEMPEL-ZIV SLIDING WINDOW UPDATE WITH SUFFIX ARRAYS
spellingShingle LEMPEL-ZIV SLIDING WINDOW UPDATE WITH SUFFIX ARRAYS
Ferreira, Artur
Computers; Data Compression; Data Structures
Lempel-Ziv compression; suffix arrays; sliding window update; substring search
title_short LEMPEL-ZIV SLIDING WINDOW UPDATE WITH SUFFIX ARRAYS
title_full LEMPEL-ZIV SLIDING WINDOW UPDATE WITH SUFFIX ARRAYS
title_fullStr LEMPEL-ZIV SLIDING WINDOW UPDATE WITH SUFFIX ARRAYS
title_full_unstemmed LEMPEL-ZIV SLIDING WINDOW UPDATE WITH SUFFIX ARRAYS
title_sort LEMPEL-ZIV SLIDING WINDOW UPDATE WITH SUFFIX ARRAYS
author Ferreira, Artur
author_facet Ferreira, Artur
Oliveira, Arlindo
Figueiredo, Mario
author_role author
author2 Oliveira, Arlindo
Figueiredo, Mario
author2_role author
author
dc.contributor.author.fl_str_mv Ferreira, Artur
Oliveira, Arlindo
Figueiredo, Mario
dc.subject.por.fl_str_mv Computers; Data Compression; Data Structures
Lempel-Ziv compression; suffix arrays; sliding window update; substring search
topic Computers; Data Compression; Data Structures
Lempel-Ziv compression; suffix arrays; sliding window update; substring search
description The sliding window dictionary-based algorithms of the Lempel-Ziv (LZ) 77 family are widely used for universal lossless data compression. The encoding component of these algorithms performs repeated substring search. Data structures, such as hash tables, binary search trees, and suffix trees have been used to speedup these searches, at the expense of memory usage. Previous work has shown how suffix arrays (SA) can be used for dictionary representation and LZ77 decomposition. In this paper, we improve over that work by proposing a new efficient algorithm to update the sliding window each time a token is produced at the output. The proposed algorithm toggles between two SA on consecutive tokens. The resulting SA-based encoder requires less memory than the conventional tree-based encoders. In comparing our SA-based technique against tree-based encoders, on a large set of benchmark files, we find that, in some compression settings, our encoder is also faster than tree-based encoders.
publishDate 2013
dc.date.none.fl_str_mv 2013-06-27T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://doi.org/10.34629/ipl.isel.i-ETC.6
oai:i-ETC.journals.isel.pt:article/6
url https://doi.org/10.34629/ipl.isel.i-ETC.6
identifier_str_mv oai:i-ETC.journals.isel.pt:article/6
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv http://journals.isel.pt/index.php/i-ETC/article/view/6
https://doi.org/10.34629/ipl.isel.i-ETC.6
http://journals.isel.pt/index.php/i-ETC/article/view/6/6
dc.rights.driver.fl_str_mv Copyright (c) 2013 i-ETC : ISEL Academic Journal of Electronics Telecommunications and Computers
http://creativecommons.org/licenses/by-nc/4.0
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Copyright (c) 2013 i-ETC : ISEL Academic Journal of Electronics Telecommunications and Computers
http://creativecommons.org/licenses/by-nc/4.0
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv ISEL - High Institute of Engineering of Lisbon
publisher.none.fl_str_mv ISEL - High Institute of Engineering of Lisbon
dc.source.none.fl_str_mv i-ETC : ISEL Academic Journal of Electronics Telecommunications and Computers; Vol 2, No 1 (2013): The CETC2011 Issue; ID-4
i-ETC : ISEL Academic Journal of Electronics Telecommunications and Computers; Vol 2, No 1 (2013): The CETC2011 Issue; ID-4
2182-4010
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799130375460487168