Sequential and parallel approaches to reduce the data cube size.

Detalhes bibliográficos
Autor(a) principal: Joubert de Castro Lima
Data de Publicação: 2009
Tipo de documento: Tese
Idioma: eng
Título da fonte: Biblioteca Digital de Teses e Dissertações do ITA
Texto Completo: http://www.bd.bibl.ita.br/tde_busca/arquivo.php?codArquivo=786
Resumo: Since the introduction of Data Warehouse (DW) and Online Analytical Processing (OLAP) technologies, efficient computation of data cubes has become one of the most relevant and pervasive problems in the DW area. The data cube operator has exponential complexity; therefore, the materialization of a data cube involves both huge amount of memory and substantial amount of time for its generation. Reducing the size of data cubes, without loss of generality, thus becomes one of the essential aspects for achieving effective OLAP services. Previous approaches reduce substantially the cube size using graph representations. A data cube can be viewed as a set of sub-graphs. In general, the approaches eliminate prefix redundancy and part of suffix redundancy of a data cube. In this work, we propose three major contributions to reduce the data cube size: MDAG, MCG and p-Cube Approaches. The MDAG approach eliminates the wildcard all (*), which represents an entire aggregation, from the cube representation, using the dimensional ID. It also uses the internal nodes to reduce the cube representation height, number of branches and number of common suffixed nodes. Unfortunately, the MDAG approach just reduces the data cube suffix redundancy, so in order to complete eliminate prefix/suffix redundancies we propose the MCG approach. The MCG approach produces a full cube with a reduction ratio of 70-90% when compared to a Star full cube representation. In the same scenarios, the new Star approach, proposed in 2007, reduces only 10-30%, Dwarf 30-50% and MDAG 40-60% of memory consumption when compared to Star approach. Our approaches are, on average, 20-50% faster than Dwarf and Star approaches. In this work, we also propose a parallel cube approach, named p-Cube. The p-Cube approach improves the runtime of Star, MDAG and MCG approaches, while keeping their low memory consumption benefits. The p-Cube approach uses an attribute-based data cube decomposition strategy which combines both task and data parallelism. It uses the dimensions attribute values to partition the data cube into a set of disjoint sub-cubes with similar size. The p-Cube approach provides similar memory consumption among its threads. Its logical design can be implemented in shared-memory, distributed-memory and hybrid architectures with minimal adaptation.
id ITA_399de2d08889239412846b03031cc53b
oai_identifier_str oai:agregador.ibict.br.BDTD_ITA:oai:ita.br:786
network_acronym_str ITA
network_name_str Biblioteca Digital de Teses e Dissertações do ITA
spelling Sequential and parallel approaches to reduce the data cube size.Mineração de dadosCeleiro de dadosEstrutura de dadosArmazenamento de dadosBanco de dadosComplexidade computacionalProcessamento em paralelo (computadores)ComputaçãoSince the introduction of Data Warehouse (DW) and Online Analytical Processing (OLAP) technologies, efficient computation of data cubes has become one of the most relevant and pervasive problems in the DW area. The data cube operator has exponential complexity; therefore, the materialization of a data cube involves both huge amount of memory and substantial amount of time for its generation. Reducing the size of data cubes, without loss of generality, thus becomes one of the essential aspects for achieving effective OLAP services. Previous approaches reduce substantially the cube size using graph representations. A data cube can be viewed as a set of sub-graphs. In general, the approaches eliminate prefix redundancy and part of suffix redundancy of a data cube. In this work, we propose three major contributions to reduce the data cube size: MDAG, MCG and p-Cube Approaches. The MDAG approach eliminates the wildcard all (*), which represents an entire aggregation, from the cube representation, using the dimensional ID. It also uses the internal nodes to reduce the cube representation height, number of branches and number of common suffixed nodes. Unfortunately, the MDAG approach just reduces the data cube suffix redundancy, so in order to complete eliminate prefix/suffix redundancies we propose the MCG approach. The MCG approach produces a full cube with a reduction ratio of 70-90% when compared to a Star full cube representation. In the same scenarios, the new Star approach, proposed in 2007, reduces only 10-30%, Dwarf 30-50% and MDAG 40-60% of memory consumption when compared to Star approach. Our approaches are, on average, 20-50% faster than Dwarf and Star approaches. In this work, we also propose a parallel cube approach, named p-Cube. The p-Cube approach improves the runtime of Star, MDAG and MCG approaches, while keeping their low memory consumption benefits. The p-Cube approach uses an attribute-based data cube decomposition strategy which combines both task and data parallelism. It uses the dimensions attribute values to partition the data cube into a set of disjoint sub-cubes with similar size. The p-Cube approach provides similar memory consumption among its threads. Its logical design can be implemented in shared-memory, distributed-memory and hybrid architectures with minimal adaptation.Instituto Tecnológico de AeronáuticaCelso Massaki HirataJoubert de Castro Lima2009-05-08info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesishttp://www.bd.bibl.ita.br/tde_busca/arquivo.php?codArquivo=786reponame:Biblioteca Digital de Teses e Dissertações do ITAinstname:Instituto Tecnológico de Aeronáuticainstacron:ITAenginfo:eu-repo/semantics/openAccessapplication/pdf2019-02-02T14:01:55Zoai:agregador.ibict.br.BDTD_ITA:oai:ita.br:786http://oai.bdtd.ibict.br/requestopendoar:null2020-05-28 19:34:22.576Biblioteca Digital de Teses e Dissertações do ITA - Instituto Tecnológico de Aeronáuticatrue
dc.title.none.fl_str_mv Sequential and parallel approaches to reduce the data cube size.
title Sequential and parallel approaches to reduce the data cube size.
spellingShingle Sequential and parallel approaches to reduce the data cube size.
Joubert de Castro Lima
Mineração de dados
Celeiro de dados
Estrutura de dados
Armazenamento de dados
Banco de dados
Complexidade computacional
Processamento em paralelo (computadores)
Computação
title_short Sequential and parallel approaches to reduce the data cube size.
title_full Sequential and parallel approaches to reduce the data cube size.
title_fullStr Sequential and parallel approaches to reduce the data cube size.
title_full_unstemmed Sequential and parallel approaches to reduce the data cube size.
title_sort Sequential and parallel approaches to reduce the data cube size.
author Joubert de Castro Lima
author_facet Joubert de Castro Lima
author_role author
dc.contributor.none.fl_str_mv Celso Massaki Hirata
dc.contributor.author.fl_str_mv Joubert de Castro Lima
dc.subject.por.fl_str_mv Mineração de dados
Celeiro de dados
Estrutura de dados
Armazenamento de dados
Banco de dados
Complexidade computacional
Processamento em paralelo (computadores)
Computação
topic Mineração de dados
Celeiro de dados
Estrutura de dados
Armazenamento de dados
Banco de dados
Complexidade computacional
Processamento em paralelo (computadores)
Computação
dc.description.none.fl_txt_mv Since the introduction of Data Warehouse (DW) and Online Analytical Processing (OLAP) technologies, efficient computation of data cubes has become one of the most relevant and pervasive problems in the DW area. The data cube operator has exponential complexity; therefore, the materialization of a data cube involves both huge amount of memory and substantial amount of time for its generation. Reducing the size of data cubes, without loss of generality, thus becomes one of the essential aspects for achieving effective OLAP services. Previous approaches reduce substantially the cube size using graph representations. A data cube can be viewed as a set of sub-graphs. In general, the approaches eliminate prefix redundancy and part of suffix redundancy of a data cube. In this work, we propose three major contributions to reduce the data cube size: MDAG, MCG and p-Cube Approaches. The MDAG approach eliminates the wildcard all (*), which represents an entire aggregation, from the cube representation, using the dimensional ID. It also uses the internal nodes to reduce the cube representation height, number of branches and number of common suffixed nodes. Unfortunately, the MDAG approach just reduces the data cube suffix redundancy, so in order to complete eliminate prefix/suffix redundancies we propose the MCG approach. The MCG approach produces a full cube with a reduction ratio of 70-90% when compared to a Star full cube representation. In the same scenarios, the new Star approach, proposed in 2007, reduces only 10-30%, Dwarf 30-50% and MDAG 40-60% of memory consumption when compared to Star approach. Our approaches are, on average, 20-50% faster than Dwarf and Star approaches. In this work, we also propose a parallel cube approach, named p-Cube. The p-Cube approach improves the runtime of Star, MDAG and MCG approaches, while keeping their low memory consumption benefits. The p-Cube approach uses an attribute-based data cube decomposition strategy which combines both task and data parallelism. It uses the dimensions attribute values to partition the data cube into a set of disjoint sub-cubes with similar size. The p-Cube approach provides similar memory consumption among its threads. Its logical design can be implemented in shared-memory, distributed-memory and hybrid architectures with minimal adaptation.
description Since the introduction of Data Warehouse (DW) and Online Analytical Processing (OLAP) technologies, efficient computation of data cubes has become one of the most relevant and pervasive problems in the DW area. The data cube operator has exponential complexity; therefore, the materialization of a data cube involves both huge amount of memory and substantial amount of time for its generation. Reducing the size of data cubes, without loss of generality, thus becomes one of the essential aspects for achieving effective OLAP services. Previous approaches reduce substantially the cube size using graph representations. A data cube can be viewed as a set of sub-graphs. In general, the approaches eliminate prefix redundancy and part of suffix redundancy of a data cube. In this work, we propose three major contributions to reduce the data cube size: MDAG, MCG and p-Cube Approaches. The MDAG approach eliminates the wildcard all (*), which represents an entire aggregation, from the cube representation, using the dimensional ID. It also uses the internal nodes to reduce the cube representation height, number of branches and number of common suffixed nodes. Unfortunately, the MDAG approach just reduces the data cube suffix redundancy, so in order to complete eliminate prefix/suffix redundancies we propose the MCG approach. The MCG approach produces a full cube with a reduction ratio of 70-90% when compared to a Star full cube representation. In the same scenarios, the new Star approach, proposed in 2007, reduces only 10-30%, Dwarf 30-50% and MDAG 40-60% of memory consumption when compared to Star approach. Our approaches are, on average, 20-50% faster than Dwarf and Star approaches. In this work, we also propose a parallel cube approach, named p-Cube. The p-Cube approach improves the runtime of Star, MDAG and MCG approaches, while keeping their low memory consumption benefits. The p-Cube approach uses an attribute-based data cube decomposition strategy which combines both task and data parallelism. It uses the dimensions attribute values to partition the data cube into a set of disjoint sub-cubes with similar size. The p-Cube approach provides similar memory consumption among its threads. Its logical design can be implemented in shared-memory, distributed-memory and hybrid architectures with minimal adaptation.
publishDate 2009
dc.date.none.fl_str_mv 2009-05-08
dc.type.driver.fl_str_mv info:eu-repo/semantics/publishedVersion
info:eu-repo/semantics/doctoralThesis
status_str publishedVersion
format doctoralThesis
dc.identifier.uri.fl_str_mv http://www.bd.bibl.ita.br/tde_busca/arquivo.php?codArquivo=786
url http://www.bd.bibl.ita.br/tde_busca/arquivo.php?codArquivo=786
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Instituto Tecnológico de Aeronáutica
publisher.none.fl_str_mv Instituto Tecnológico de Aeronáutica
dc.source.none.fl_str_mv reponame:Biblioteca Digital de Teses e Dissertações do ITA
instname:Instituto Tecnológico de Aeronáutica
instacron:ITA
reponame_str Biblioteca Digital de Teses e Dissertações do ITA
collection Biblioteca Digital de Teses e Dissertações do ITA
instname_str Instituto Tecnológico de Aeronáutica
instacron_str ITA
institution ITA
repository.name.fl_str_mv Biblioteca Digital de Teses e Dissertações do ITA - Instituto Tecnológico de Aeronáutica
repository.mail.fl_str_mv
subject_por_txtF_mv Mineração de dados
Celeiro de dados
Estrutura de dados
Armazenamento de dados
Banco de dados
Complexidade computacional
Processamento em paralelo (computadores)
Computação
_version_ 1706809262521450496