Sequential and parallel approaches to reduce the data cube size.

Joubert de Castro Lima

Sequential and parallel approaches to reduce the data cube size.

Detalhes bibliográficos
Autor(a) principal:	Joubert de Castro Lima
Data de Publicação:	2009
Tipo de documento:	Tese
Idioma:	eng
Título da fonte:	Biblioteca Digital de Teses e Dissertações do ITA
Texto Completo:	http://www.bd.bibl.ita.br/tde_busca/arquivo.php?codArquivo=786
Resumo:	Since the introduction of Data Warehouse (DW) and Online Analytical Processing (OLAP) technologies, efficient computation of data cubes has become one of the most relevant and pervasive problems in the DW area. The data cube operator has exponential complexity; therefore, the materialization of a data cube involves both huge amount of memory and substantial amount of time for its generation. Reducing the size of data cubes, without loss of generality, thus becomes one of the essential aspects for achieving effective OLAP services. Previous approaches reduce substantially the cube size using graph representations. A data cube can be viewed as a set of sub-graphs. In general, the approaches eliminate prefix redundancy and part of suffix redundancy of a data cube. In this work, we propose three major contributions to reduce the data cube size: MDAG, MCG and p-Cube Approaches. The MDAG approach eliminates the wildcard all (*), which represents an entire aggregation, from the cube representation, using the dimensional ID. It also uses the internal nodes to reduce the cube representation height, number of branches and number of common suffixed nodes. Unfortunately, the MDAG approach just reduces the data cube suffix redundancy, so in order to complete eliminate prefix/suffix redundancies we propose the MCG approach. The MCG approach produces a full cube with a reduction ratio of 70-90% when compared to a Star full cube representation. In the same scenarios, the new Star approach, proposed in 2007, reduces only 10-30%, Dwarf 30-50% and MDAG 40-60% of memory consumption when compared to Star approach. Our approaches are, on average, 20-50% faster than Dwarf and Star approaches. In this work, we also propose a parallel cube approach, named p-Cube. The p-Cube approach improves the runtime of Star, MDAG and MCG approaches, while keeping their low memory consumption benefits. The p-Cube approach uses an attribute-based data cube decomposition strategy which combines both task and data parallelism. It uses the dimensions attribute values to partition the data cube into a set of disjoint sub-cubes with similar size. The p-Cube approach provides similar memory consumption among its threads. Its logical design can be implemented in shared-memory, distributed-memory and hybrid architectures with minimal adaptation.

Metadados do item

id	ITA_399de2d08889239412846b03031cc53b
oai_identifier_str	oai:agregador.ibict.br.BDTD_ITA:oai:ita.br:786
network_acronym_str	ITA
network_name_str	Biblioteca Digital de Teses e Dissertações do ITA
spelling	Sequential and parallel approaches to reduce the data cube size.Mineração de dadosCeleiro de dadosEstrutura de dadosArmazenamento de dadosBanco de dadosComplexidade computacionalProcessamento em paralelo (computadores)ComputaçãoSince the introduction of Data Warehouse (DW) and Online Analytical Processing (OLAP) technologies, efficient computation of data cubes has become one of the most relevant and pervasive problems in the DW area. The data cube operator has exponential complexity; therefore, the materialization of a data cube involves both huge amount of memory and substantial amount of time for its generation. Reducing the size of data cubes, without loss of generality, thus becomes one of the essential aspects for achieving effective OLAP services. Previous approaches reduce substantially the cube size using graph representations. A data cube can be viewed as a set of sub-graphs. In general, the approaches eliminate prefix redundancy and part of suffix redundancy of a data cube. In this work, we propose three major contributions to reduce the data cube size: MDAG, MCG and p-Cube Approaches. The MDAG approach eliminates the wildcard all (*), which represents an entire aggregation, from the cube representation, using the dimensional ID. It also uses the internal nodes to reduce the cube representation height, number of branches and number of common suffixed nodes. Unfortunately, the MDAG approach just reduces the data cube suffix redundancy, so in order to complete eliminate prefix/suffix redundancies we propose the MCG approach. The MCG approach produces a full cube with a reduction ratio of 70-90% when compared to a Star full cube representation. In the same scenarios, the new Star approach, proposed in 2007, reduces only 10-30%, Dwarf 30-50% and MDAG 40-60% of memory consumption when compared to Star approach. Our approaches are, on average, 20-50% faster than Dwarf and Star approaches. In this work, we also propose a parallel cube approach, named p-Cube. The p-Cube approach improves the runtime of Star, MDAG and MCG approaches, while keeping their low memory consumption benefits. The p-Cube approach uses an attribute-based data cube decomposition strategy which combines both task and data parallelism. It uses the dimensions attribute values to partition the data cube into a set of disjoint sub-cubes with similar size. The p-Cube approach provides similar memory consumption among its threads. Its logical design can be implemented in shared-memory, distributed-memory and hybrid architectures with minimal adaptation.Instituto Tecnológico de AeronáuticaCelso Massaki HirataJoubert de Castro Lima2009-05-08info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesishttp://www.bd.bibl.ita.br/tde_busca/arquivo.php?codArquivo=786reponame:Biblioteca Digital de Teses e Dissertações do ITAinstname:Instituto Tecnológico de Aeronáuticainstacron:ITAenginfo:eu-repo/semantics/openAccessapplication/pdf2019-02-02T14:01:55Zoai:agregador.ibict.br.BDTD_ITA:oai:ita.br:786http://oai.bdtd.ibict.br/requestopendoar:null2020-05-28 19:34:22.576Biblioteca Digital de Teses e Dissertações do ITA - Instituto Tecnológico de Aeronáuticatrue
dc.title.none.fl_str_mv	Sequential and parallel approaches to reduce the data cube size.
title	Sequential and parallel approaches to reduce the data cube size.
spellingShingle	Sequential and parallel approaches to reduce the data cube size. Joubert de Castro Lima Mineração de dados Celeiro de dados Estrutura de dados Armazenamento de dados Banco de dados Complexidade computacional Processamento em paralelo (computadores) Computação
title_short	Sequential and parallel approaches to reduce the data cube size.
title_full	Sequential and parallel approaches to reduce the data cube size.
title_fullStr	Sequential and parallel approaches to reduce the data cube size.
title_full_unstemmed	Sequential and parallel approaches to reduce the data cube size.
title_sort	Sequential and parallel approaches to reduce the data cube size.
author	Joubert de Castro Lima
author_facet	Joubert de Castro Lima
author_role	author
dc.contributor.none.fl_str_mv	Celso Massaki Hirata
dc.contributor.author.fl_str_mv	Joubert de Castro Lima
dc.subject.por.fl_str_mv	Mineração de dados Celeiro de dados Estrutura de dados Armazenamento de dados Banco de dados Complexidade computacional Processamento em paralelo (computadores) Computação
topic	Mineração de dados Celeiro de dados Estrutura de dados Armazenamento de dados Banco de dados Complexidade computacional Processamento em paralelo (computadores) Computação
dc.description.none.fl_txt_mv	Since the introduction of Data Warehouse (DW) and Online Analytical Processing (OLAP) technologies, efficient computation of data cubes has become one of the most relevant and pervasive problems in the DW area. The data cube operator has exponential complexity; therefore, the materialization of a data cube involves both huge amount of memory and substantial amount of time for its generation. Reducing the size of data cubes, without loss of generality, thus becomes one of the essential aspects for achieving effective OLAP services. Previous approaches reduce substantially the cube size using graph representations. A data cube can be viewed as a set of sub-graphs. In general, the approaches eliminate prefix redundancy and part of suffix redundancy of a data cube. In this work, we propose three major contributions to reduce the data cube size: MDAG, MCG and p-Cube Approaches. The MDAG approach eliminates the wildcard all (*), which represents an entire aggregation, from the cube representation, using the dimensional ID. It also uses the internal nodes to reduce the cube representation height, number of branches and number of common suffixed nodes. Unfortunately, the MDAG approach just reduces the data cube suffix redundancy, so in order to complete eliminate prefix/suffix redundancies we propose the MCG approach. The MCG approach produces a full cube with a reduction ratio of 70-90% when compared to a Star full cube representation. In the same scenarios, the new Star approach, proposed in 2007, reduces only 10-30%, Dwarf 30-50% and MDAG 40-60% of memory consumption when compared to Star approach. Our approaches are, on average, 20-50% faster than Dwarf and Star approaches. In this work, we also propose a parallel cube approach, named p-Cube. The p-Cube approach improves the runtime of Star, MDAG and MCG approaches, while keeping their low memory consumption benefits. The p-Cube approach uses an attribute-based data cube decomposition strategy which combines both task and data parallelism. It uses the dimensions attribute values to partition the data cube into a set of disjoint sub-cubes with similar size. The p-Cube approach provides similar memory consumption among its threads. Its logical design can be implemented in shared-memory, distributed-memory and hybrid architectures with minimal adaptation.
description	Since the introduction of Data Warehouse (DW) and Online Analytical Processing (OLAP) technologies, efficient computation of data cubes has become one of the most relevant and pervasive problems in the DW area. The data cube operator has exponential complexity; therefore, the materialization of a data cube involves both huge amount of memory and substantial amount of time for its generation. Reducing the size of data cubes, without loss of generality, thus becomes one of the essential aspects for achieving effective OLAP services. Previous approaches reduce substantially the cube size using graph representations. A data cube can be viewed as a set of sub-graphs. In general, the approaches eliminate prefix redundancy and part of suffix redundancy of a data cube. In this work, we propose three major contributions to reduce the data cube size: MDAG, MCG and p-Cube Approaches. The MDAG approach eliminates the wildcard all (*), which represents an entire aggregation, from the cube representation, using the dimensional ID. It also uses the internal nodes to reduce the cube representation height, number of branches and number of common suffixed nodes. Unfortunately, the MDAG approach just reduces the data cube suffix redundancy, so in order to complete eliminate prefix/suffix redundancies we propose the MCG approach. The MCG approach produces a full cube with a reduction ratio of 70-90% when compared to a Star full cube representation. In the same scenarios, the new Star approach, proposed in 2007, reduces only 10-30%, Dwarf 30-50% and MDAG 40-60% of memory consumption when compared to Star approach. Our approaches are, on average, 20-50% faster than Dwarf and Star approaches. In this work, we also propose a parallel cube approach, named p-Cube. The p-Cube approach improves the runtime of Star, MDAG and MCG approaches, while keeping their low memory consumption benefits. The p-Cube approach uses an attribute-based data cube decomposition strategy which combines both task and data parallelism. It uses the dimensions attribute values to partition the data cube into a set of disjoint sub-cubes with similar size. The p-Cube approach provides similar memory consumption among its threads. Its logical design can be implemented in shared-memory, distributed-memory and hybrid architectures with minimal adaptation.
publishDate	2009
dc.date.none.fl_str_mv	2009-05-08
dc.type.driver.fl_str_mv	info:eu-repo/semantics/publishedVersion info:eu-repo/semantics/doctoralThesis
status_str	publishedVersion
format	doctoralThesis
dc.identifier.uri.fl_str_mv	http://www.bd.bibl.ita.br/tde_busca/arquivo.php?codArquivo=786
url	http://www.bd.bibl.ita.br/tde_busca/arquivo.php?codArquivo=786
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Instituto Tecnológico de Aeronáutica
publisher.none.fl_str_mv	Instituto Tecnológico de Aeronáutica
dc.source.none.fl_str_mv	reponame:Biblioteca Digital de Teses e Dissertações do ITA instname:Instituto Tecnológico de Aeronáutica instacron:ITA
reponame_str	Biblioteca Digital de Teses e Dissertações do ITA
collection	Biblioteca Digital de Teses e Dissertações do ITA
instname_str	Instituto Tecnológico de Aeronáutica
instacron_str	ITA
institution	ITA
repository.name.fl_str_mv	Biblioteca Digital de Teses e Dissertações do ITA - Instituto Tecnológico de Aeronáutica
repository.mail.fl_str_mv
subject_por_txtF_mv	Mineração de dados Celeiro de dados Estrutura de dados Armazenamento de dados Banco de dados Complexidade computacional Processamento em paralelo (computadores) Computação
_version_	1706809262521450496

Sequential and parallel approaches to reduce the data cube size.

Registros relacionados