Double distance-calculation-pruning for similarity search
Autor(a) principal: | |
---|---|
Data de Publicação: | 2018 |
Outros Autores: | , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Institucional da UNESP |
Texto Completo: | http://dx.doi.org/10.3390/info9050124 http://hdl.handle.net/11449/179874 |
Resumo: | Many modern applications deal with complex data, where retrieval by similarity plays an important role. Complex data main comparison mechanisms are based on similarity predicates. They are usually immersed in metric spaces where distance functions are employed to express the similarity and a lower bound property is usually employed to prevent distance calculations. Retrieval by similarity is implemented by unary and binary operators. Most of the studies aimed at improving the efficiency of unary operators, either by using metric access methods or mathematical properties to prune parts of the search space during query answering. Studies on binary operators to solve similarity joins aim to improve efficiency and most of them use only the metric lower bound property for pruning. However, they are dependent on the query parameters, such as the range radius. In this paper, we propose a generic concept that uses both lower and upper bound properties based on the Metric Spaces Theory to increase the avoidance of element comparisons. The concept can be applied on any existing similarity retrieval method. We analyzed the prunability power increase and show an example of its application on classical join nested loops algorithms. Practical evaluation over both synthetic and real data sets shows that our method reduced the number of distance evaluations on similarity joins. |
id |
UNSP_376c0eb6e80615dba58425341b591cf5 |
---|---|
oai_identifier_str |
oai:repositorio.unesp.br:11449/179874 |
network_acronym_str |
UNSP |
network_name_str |
Repositório Institucional da UNESP |
repository_id_str |
2946 |
spelling |
Double distance-calculation-pruning for similarity searchInformation retrievalMetric indexingSimilarity joinsMany modern applications deal with complex data, where retrieval by similarity plays an important role. Complex data main comparison mechanisms are based on similarity predicates. They are usually immersed in metric spaces where distance functions are employed to express the similarity and a lower bound property is usually employed to prevent distance calculations. Retrieval by similarity is implemented by unary and binary operators. Most of the studies aimed at improving the efficiency of unary operators, either by using metric access methods or mathematical properties to prune parts of the search space during query answering. Studies on binary operators to solve similarity joins aim to improve efficiency and most of them use only the metric lower bound property for pruning. However, they are dependent on the query parameters, such as the range radius. In this paper, we propose a generic concept that uses both lower and upper bound properties based on the Metric Spaces Theory to increase the avoidance of element comparisons. The concept can be applied on any existing similarity retrieval method. We analyzed the prunability power increase and show an example of its application on classical join nested loops algorithms. Practical evaluation over both synthetic and real data sets shows that our method reduced the number of distance evaluations on similarity joins.Department of Informatics Federal University of Technology-UTFPRDepartment of Mathematics Federal University of Technology-UTFPRSão Paulo State University-UNESP Bairro: Centro Educacional, Rua Roberto Simonsen, 305São Paulo State University-UNESP Bairro: Centro Educacional, Rua Roberto Simonsen, 305Federal University of Technology-UTFPRUniversidade Estadual Paulista (Unesp)Pola, Ives Renê VenturiniPola, Fernanda Paula BarbosaEler, Danilo Medeiros [UNESP]2018-12-11T17:37:07Z2018-12-11T17:37:07Z2018-05-17info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://dx.doi.org/10.3390/info9050124Information (Switzerland), v. 9, n. 5, 2018.2078-2489http://hdl.handle.net/11449/17987410.3390/info90501242-s2.0-850471458112-s2.0-85047145811.pdfScopusreponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESPengInformation (Switzerland)0,222info:eu-repo/semantics/openAccess2024-06-19T14:32:06Zoai:repositorio.unesp.br:11449/179874Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestopendoar:29462024-08-05T23:07:55.108309Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false |
dc.title.none.fl_str_mv |
Double distance-calculation-pruning for similarity search |
title |
Double distance-calculation-pruning for similarity search |
spellingShingle |
Double distance-calculation-pruning for similarity search Pola, Ives Renê Venturini Information retrieval Metric indexing Similarity joins |
title_short |
Double distance-calculation-pruning for similarity search |
title_full |
Double distance-calculation-pruning for similarity search |
title_fullStr |
Double distance-calculation-pruning for similarity search |
title_full_unstemmed |
Double distance-calculation-pruning for similarity search |
title_sort |
Double distance-calculation-pruning for similarity search |
author |
Pola, Ives Renê Venturini |
author_facet |
Pola, Ives Renê Venturini Pola, Fernanda Paula Barbosa Eler, Danilo Medeiros [UNESP] |
author_role |
author |
author2 |
Pola, Fernanda Paula Barbosa Eler, Danilo Medeiros [UNESP] |
author2_role |
author author |
dc.contributor.none.fl_str_mv |
Federal University of Technology-UTFPR Universidade Estadual Paulista (Unesp) |
dc.contributor.author.fl_str_mv |
Pola, Ives Renê Venturini Pola, Fernanda Paula Barbosa Eler, Danilo Medeiros [UNESP] |
dc.subject.por.fl_str_mv |
Information retrieval Metric indexing Similarity joins |
topic |
Information retrieval Metric indexing Similarity joins |
description |
Many modern applications deal with complex data, where retrieval by similarity plays an important role. Complex data main comparison mechanisms are based on similarity predicates. They are usually immersed in metric spaces where distance functions are employed to express the similarity and a lower bound property is usually employed to prevent distance calculations. Retrieval by similarity is implemented by unary and binary operators. Most of the studies aimed at improving the efficiency of unary operators, either by using metric access methods or mathematical properties to prune parts of the search space during query answering. Studies on binary operators to solve similarity joins aim to improve efficiency and most of them use only the metric lower bound property for pruning. However, they are dependent on the query parameters, such as the range radius. In this paper, we propose a generic concept that uses both lower and upper bound properties based on the Metric Spaces Theory to increase the avoidance of element comparisons. The concept can be applied on any existing similarity retrieval method. We analyzed the prunability power increase and show an example of its application on classical join nested loops algorithms. Practical evaluation over both synthetic and real data sets shows that our method reduced the number of distance evaluations on similarity joins. |
publishDate |
2018 |
dc.date.none.fl_str_mv |
2018-12-11T17:37:07Z 2018-12-11T17:37:07Z 2018-05-17 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://dx.doi.org/10.3390/info9050124 Information (Switzerland), v. 9, n. 5, 2018. 2078-2489 http://hdl.handle.net/11449/179874 10.3390/info9050124 2-s2.0-85047145811 2-s2.0-85047145811.pdf |
url |
http://dx.doi.org/10.3390/info9050124 http://hdl.handle.net/11449/179874 |
identifier_str_mv |
Information (Switzerland), v. 9, n. 5, 2018. 2078-2489 10.3390/info9050124 2-s2.0-85047145811 2-s2.0-85047145811.pdf |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
Information (Switzerland) 0,222 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
Scopus reponame:Repositório Institucional da UNESP instname:Universidade Estadual Paulista (UNESP) instacron:UNESP |
instname_str |
Universidade Estadual Paulista (UNESP) |
instacron_str |
UNESP |
institution |
UNESP |
reponame_str |
Repositório Institucional da UNESP |
collection |
Repositório Institucional da UNESP |
repository.name.fl_str_mv |
Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP) |
repository.mail.fl_str_mv |
|
_version_ |
1808129492348043264 |