Double distance-calculation-pruning for similarity search

Detalhes bibliográficos
Autor(a) principal: Pola, Ives Renê Venturini
Data de Publicação: 2018
Outros Autores: Pola, Fernanda Paula Barbosa, Eler, Danilo Medeiros [UNESP]
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Institucional da UNESP
Texto Completo: http://dx.doi.org/10.3390/info9050124
http://hdl.handle.net/11449/179874
Resumo: Many modern applications deal with complex data, where retrieval by similarity plays an important role. Complex data main comparison mechanisms are based on similarity predicates. They are usually immersed in metric spaces where distance functions are employed to express the similarity and a lower bound property is usually employed to prevent distance calculations. Retrieval by similarity is implemented by unary and binary operators. Most of the studies aimed at improving the efficiency of unary operators, either by using metric access methods or mathematical properties to prune parts of the search space during query answering. Studies on binary operators to solve similarity joins aim to improve efficiency and most of them use only the metric lower bound property for pruning. However, they are dependent on the query parameters, such as the range radius. In this paper, we propose a generic concept that uses both lower and upper bound properties based on the Metric Spaces Theory to increase the avoidance of element comparisons. The concept can be applied on any existing similarity retrieval method. We analyzed the prunability power increase and show an example of its application on classical join nested loops algorithms. Practical evaluation over both synthetic and real data sets shows that our method reduced the number of distance evaluations on similarity joins.
id UNSP_376c0eb6e80615dba58425341b591cf5
oai_identifier_str oai:repositorio.unesp.br:11449/179874
network_acronym_str UNSP
network_name_str Repositório Institucional da UNESP
repository_id_str 2946
spelling Double distance-calculation-pruning for similarity searchInformation retrievalMetric indexingSimilarity joinsMany modern applications deal with complex data, where retrieval by similarity plays an important role. Complex data main comparison mechanisms are based on similarity predicates. They are usually immersed in metric spaces where distance functions are employed to express the similarity and a lower bound property is usually employed to prevent distance calculations. Retrieval by similarity is implemented by unary and binary operators. Most of the studies aimed at improving the efficiency of unary operators, either by using metric access methods or mathematical properties to prune parts of the search space during query answering. Studies on binary operators to solve similarity joins aim to improve efficiency and most of them use only the metric lower bound property for pruning. However, they are dependent on the query parameters, such as the range radius. In this paper, we propose a generic concept that uses both lower and upper bound properties based on the Metric Spaces Theory to increase the avoidance of element comparisons. The concept can be applied on any existing similarity retrieval method. We analyzed the prunability power increase and show an example of its application on classical join nested loops algorithms. Practical evaluation over both synthetic and real data sets shows that our method reduced the number of distance evaluations on similarity joins.Department of Informatics Federal University of Technology-UTFPRDepartment of Mathematics Federal University of Technology-UTFPRSão Paulo State University-UNESP Bairro: Centro Educacional, Rua Roberto Simonsen, 305São Paulo State University-UNESP Bairro: Centro Educacional, Rua Roberto Simonsen, 305Federal University of Technology-UTFPRUniversidade Estadual Paulista (Unesp)Pola, Ives Renê VenturiniPola, Fernanda Paula BarbosaEler, Danilo Medeiros [UNESP]2018-12-11T17:37:07Z2018-12-11T17:37:07Z2018-05-17info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://dx.doi.org/10.3390/info9050124Information (Switzerland), v. 9, n. 5, 2018.2078-2489http://hdl.handle.net/11449/17987410.3390/info90501242-s2.0-850471458112-s2.0-85047145811.pdfScopusreponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESPengInformation (Switzerland)0,222info:eu-repo/semantics/openAccess2024-06-19T14:32:06Zoai:repositorio.unesp.br:11449/179874Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestopendoar:29462024-08-05T23:07:55.108309Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false
dc.title.none.fl_str_mv Double distance-calculation-pruning for similarity search
title Double distance-calculation-pruning for similarity search
spellingShingle Double distance-calculation-pruning for similarity search
Pola, Ives Renê Venturini
Information retrieval
Metric indexing
Similarity joins
title_short Double distance-calculation-pruning for similarity search
title_full Double distance-calculation-pruning for similarity search
title_fullStr Double distance-calculation-pruning for similarity search
title_full_unstemmed Double distance-calculation-pruning for similarity search
title_sort Double distance-calculation-pruning for similarity search
author Pola, Ives Renê Venturini
author_facet Pola, Ives Renê Venturini
Pola, Fernanda Paula Barbosa
Eler, Danilo Medeiros [UNESP]
author_role author
author2 Pola, Fernanda Paula Barbosa
Eler, Danilo Medeiros [UNESP]
author2_role author
author
dc.contributor.none.fl_str_mv Federal University of Technology-UTFPR
Universidade Estadual Paulista (Unesp)
dc.contributor.author.fl_str_mv Pola, Ives Renê Venturini
Pola, Fernanda Paula Barbosa
Eler, Danilo Medeiros [UNESP]
dc.subject.por.fl_str_mv Information retrieval
Metric indexing
Similarity joins
topic Information retrieval
Metric indexing
Similarity joins
description Many modern applications deal with complex data, where retrieval by similarity plays an important role. Complex data main comparison mechanisms are based on similarity predicates. They are usually immersed in metric spaces where distance functions are employed to express the similarity and a lower bound property is usually employed to prevent distance calculations. Retrieval by similarity is implemented by unary and binary operators. Most of the studies aimed at improving the efficiency of unary operators, either by using metric access methods or mathematical properties to prune parts of the search space during query answering. Studies on binary operators to solve similarity joins aim to improve efficiency and most of them use only the metric lower bound property for pruning. However, they are dependent on the query parameters, such as the range radius. In this paper, we propose a generic concept that uses both lower and upper bound properties based on the Metric Spaces Theory to increase the avoidance of element comparisons. The concept can be applied on any existing similarity retrieval method. We analyzed the prunability power increase and show an example of its application on classical join nested loops algorithms. Practical evaluation over both synthetic and real data sets shows that our method reduced the number of distance evaluations on similarity joins.
publishDate 2018
dc.date.none.fl_str_mv 2018-12-11T17:37:07Z
2018-12-11T17:37:07Z
2018-05-17
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://dx.doi.org/10.3390/info9050124
Information (Switzerland), v. 9, n. 5, 2018.
2078-2489
http://hdl.handle.net/11449/179874
10.3390/info9050124
2-s2.0-85047145811
2-s2.0-85047145811.pdf
url http://dx.doi.org/10.3390/info9050124
http://hdl.handle.net/11449/179874
identifier_str_mv Information (Switzerland), v. 9, n. 5, 2018.
2078-2489
10.3390/info9050124
2-s2.0-85047145811
2-s2.0-85047145811.pdf
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv Information (Switzerland)
0,222
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv Scopus
reponame:Repositório Institucional da UNESP
instname:Universidade Estadual Paulista (UNESP)
instacron:UNESP
instname_str Universidade Estadual Paulista (UNESP)
instacron_str UNESP
institution UNESP
reponame_str Repositório Institucional da UNESP
collection Repositório Institucional da UNESP
repository.name.fl_str_mv Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)
repository.mail.fl_str_mv
_version_ 1808129492348043264