DBM-Tree: trading height-balancing for performance in metric access methods

Detalhes bibliográficos
Autor(a) principal: Vieira,Marcos R.
Data de Publicação: 2006
Outros Autores: Traina Jr.,Caetano, Chino,Fabio J. T., Traina,Agma J. M.
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Journal of the Brazilian Computer Society
Texto Completo: http://old.scielo.br/scielo.php?script=sci_arttext&pid=S0104-65002006000100004
Resumo: Metric Access Methods (MAM) are employed to accelerate the processing of similarity queries, such as the range and the k-nearest neighbor queries. Current methods, such as the Slim-tree and the M-tree, improve the query performance minimizing the number of disk accesses, keeping a constant height of the structures stored on disks (height-balanced trees). However, the overlapping between their nodes has a very high influence on their performance. This paper presents a new dynamic MAM called the DBM-tree (Density-Based Metric tree), which can minimize the overlap between high-density nodes by relaxing the height-balancing of the structure. Thus, the height of the tree is larger in denser regions, in order to keep a tradeoff between breadth-searching and depth-searching. An underpinning for cost estimation on tree structures is their height, so we show a non-height dependable cost model that can be applied for DBM-tree. Moreover, an optimization algorithm called Shrink is also presented, which improves the performance of an already built DBM-tree by reorganizing the elements among their nodes. Experiments performed over both synthetic and real world datasets showed that the DBM-tree is, in average, 50% faster than traditional MAM and reduces the number of distance calculations by up to 72% and disk accesses by up to 66%. After performing the Shrink algorithm, the performance improves up to 40% regarding the number of disk accesses for range and k-nearest neighbor queries. In addition, the DBM-tree scales up well, exhibiting linear performance with growing number of elements in the database.
id UFRGS-28_2cd266ecf2cbae5c7db9c64d9031410d
oai_identifier_str oai:scielo:S0104-65002006000100004
network_acronym_str UFRGS-28
network_name_str Journal of the Brazilian Computer Society
repository_id_str
spelling DBM-Tree: trading height-balancing for performance in metric access methodsMetric Access MethodMetric TreeIndexingSimilarity QueriesMetric Access Methods (MAM) are employed to accelerate the processing of similarity queries, such as the range and the k-nearest neighbor queries. Current methods, such as the Slim-tree and the M-tree, improve the query performance minimizing the number of disk accesses, keeping a constant height of the structures stored on disks (height-balanced trees). However, the overlapping between their nodes has a very high influence on their performance. This paper presents a new dynamic MAM called the DBM-tree (Density-Based Metric tree), which can minimize the overlap between high-density nodes by relaxing the height-balancing of the structure. Thus, the height of the tree is larger in denser regions, in order to keep a tradeoff between breadth-searching and depth-searching. An underpinning for cost estimation on tree structures is their height, so we show a non-height dependable cost model that can be applied for DBM-tree. Moreover, an optimization algorithm called Shrink is also presented, which improves the performance of an already built DBM-tree by reorganizing the elements among their nodes. Experiments performed over both synthetic and real world datasets showed that the DBM-tree is, in average, 50% faster than traditional MAM and reduces the number of distance calculations by up to 72% and disk accesses by up to 66%. After performing the Shrink algorithm, the performance improves up to 40% regarding the number of disk accesses for range and k-nearest neighbor queries. In addition, the DBM-tree scales up well, exhibiting linear performance with growing number of elements in the database.Sociedade Brasileira de Computação2006-04-01info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersiontext/htmlhttp://old.scielo.br/scielo.php?script=sci_arttext&pid=S0104-65002006000100004Journal of the Brazilian Computer Society v.11 n.3 2006reponame:Journal of the Brazilian Computer Societyinstname:Sociedade Brasileira de Computação (SBC)instacron:UFRGS10.1007/BF03192381info:eu-repo/semantics/openAccessVieira,Marcos R.Traina Jr.,CaetanoChino,Fabio J. T.Traina,Agma J. M.eng2010-10-26T00:00:00Zoai:scielo:S0104-65002006000100004Revistahttps://journal-bcs.springeropen.com/PUBhttps://old.scielo.br/oai/scielo-oai.phpjbcs@icmc.sc.usp.br1678-48040104-6500opendoar:2010-10-26T00:00Journal of the Brazilian Computer Society - Sociedade Brasileira de Computação (SBC)false
dc.title.none.fl_str_mv DBM-Tree: trading height-balancing for performance in metric access methods
title DBM-Tree: trading height-balancing for performance in metric access methods
spellingShingle DBM-Tree: trading height-balancing for performance in metric access methods
Vieira,Marcos R.
Metric Access Method
Metric Tree
Indexing
Similarity Queries
title_short DBM-Tree: trading height-balancing for performance in metric access methods
title_full DBM-Tree: trading height-balancing for performance in metric access methods
title_fullStr DBM-Tree: trading height-balancing for performance in metric access methods
title_full_unstemmed DBM-Tree: trading height-balancing for performance in metric access methods
title_sort DBM-Tree: trading height-balancing for performance in metric access methods
author Vieira,Marcos R.
author_facet Vieira,Marcos R.
Traina Jr.,Caetano
Chino,Fabio J. T.
Traina,Agma J. M.
author_role author
author2 Traina Jr.,Caetano
Chino,Fabio J. T.
Traina,Agma J. M.
author2_role author
author
author
dc.contributor.author.fl_str_mv Vieira,Marcos R.
Traina Jr.,Caetano
Chino,Fabio J. T.
Traina,Agma J. M.
dc.subject.por.fl_str_mv Metric Access Method
Metric Tree
Indexing
Similarity Queries
topic Metric Access Method
Metric Tree
Indexing
Similarity Queries
description Metric Access Methods (MAM) are employed to accelerate the processing of similarity queries, such as the range and the k-nearest neighbor queries. Current methods, such as the Slim-tree and the M-tree, improve the query performance minimizing the number of disk accesses, keeping a constant height of the structures stored on disks (height-balanced trees). However, the overlapping between their nodes has a very high influence on their performance. This paper presents a new dynamic MAM called the DBM-tree (Density-Based Metric tree), which can minimize the overlap between high-density nodes by relaxing the height-balancing of the structure. Thus, the height of the tree is larger in denser regions, in order to keep a tradeoff between breadth-searching and depth-searching. An underpinning for cost estimation on tree structures is their height, so we show a non-height dependable cost model that can be applied for DBM-tree. Moreover, an optimization algorithm called Shrink is also presented, which improves the performance of an already built DBM-tree by reorganizing the elements among their nodes. Experiments performed over both synthetic and real world datasets showed that the DBM-tree is, in average, 50% faster than traditional MAM and reduces the number of distance calculations by up to 72% and disk accesses by up to 66%. After performing the Shrink algorithm, the performance improves up to 40% regarding the number of disk accesses for range and k-nearest neighbor queries. In addition, the DBM-tree scales up well, exhibiting linear performance with growing number of elements in the database.
publishDate 2006
dc.date.none.fl_str_mv 2006-04-01
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://old.scielo.br/scielo.php?script=sci_arttext&pid=S0104-65002006000100004
url http://old.scielo.br/scielo.php?script=sci_arttext&pid=S0104-65002006000100004
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 10.1007/BF03192381
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv text/html
dc.publisher.none.fl_str_mv Sociedade Brasileira de Computação
publisher.none.fl_str_mv Sociedade Brasileira de Computação
dc.source.none.fl_str_mv Journal of the Brazilian Computer Society v.11 n.3 2006
reponame:Journal of the Brazilian Computer Society
instname:Sociedade Brasileira de Computação (SBC)
instacron:UFRGS
instname_str Sociedade Brasileira de Computação (SBC)
instacron_str UFRGS
institution UFRGS
reponame_str Journal of the Brazilian Computer Society
collection Journal of the Brazilian Computer Society
repository.name.fl_str_mv Journal of the Brazilian Computer Society - Sociedade Brasileira de Computação (SBC)
repository.mail.fl_str_mv jbcs@icmc.sc.usp.br
_version_ 1754734669907099648