Optimizing similarity queries in metric spaces meeting user\'s expectation

Detalhes bibliográficos
Autor(a) principal: Ferreira, Mônica Ribeiro Porto
Data de Publicação: 2012
Tipo de documento: Tese
Idioma: eng
Título da fonte: Biblioteca Digital de Teses e Dissertações da USP
Texto Completo: http://www.teses.usp.br/teses/disponiveis/55/55134/tde-24012013-091242/
Resumo: The complexity of data stored in large databases has increased at very fast paces. Hence, operations more elaborated than traditional queries are essential in order to extract all required information from the database. Therefore, the interest of the database community in similarity search has increased significantly. Two of the well-known types of similarity search are the Range (\'R IND. q\') and the k-Nearest Neighbor (\'kNN IND. q\') queries, which, as any of the traditional ones, can be sped up by indexing structures of the Database Management System (DBMS). Another way of speeding up queries is to perform query optimization. In this process, metrics about data are collected and employed to adjust the parameters of the search algorithms in each query execution. However, although the integration of similarity search into DBMS has begun to be deeply studied more recently, the query optimization has been developed and employed just to answer traditional queries. The execution of similarity queries, even using efficient indexing structures, tends to present higher computational cost than the execution of traditional ones. Two strategies can be applied to speed up the execution of any query, and thus they are worth to employ to answer also similarity queries. The first strategy is query rewriting based on algebraic properties and cost functions. The second technique is when external query factors are applied, such as employing the semantic expected by the user, to prune the answer space. This thesis aims at contributing to the development of novel techniques to improve the similarity-based query optimization processing, exploiting both algebraic properties and semantic restrictions as query refinements
id USP_d89ea3aec558bcc6ab0d439325f62af8
oai_identifier_str oai:teses.usp.br:tde-24012013-091242
network_acronym_str USP
network_name_str Biblioteca Digital de Teses e Dissertações da USP
repository_id_str 2721
spelling Optimizing similarity queries in metric spaces meeting user\'s expectationOtimização de operações de busca por similaridade em espaços métricosÁlgebra por similaridadeConsultas por similaridadeEspaços métricosExpectativa do usuárioMetric spacesOtimização de consultas por similaridadeSimilarity algebraSimilarity queriesSimilarity query optimizationUser's expectationThe complexity of data stored in large databases has increased at very fast paces. Hence, operations more elaborated than traditional queries are essential in order to extract all required information from the database. Therefore, the interest of the database community in similarity search has increased significantly. Two of the well-known types of similarity search are the Range (\'R IND. q\') and the k-Nearest Neighbor (\'kNN IND. q\') queries, which, as any of the traditional ones, can be sped up by indexing structures of the Database Management System (DBMS). Another way of speeding up queries is to perform query optimization. In this process, metrics about data are collected and employed to adjust the parameters of the search algorithms in each query execution. However, although the integration of similarity search into DBMS has begun to be deeply studied more recently, the query optimization has been developed and employed just to answer traditional queries. The execution of similarity queries, even using efficient indexing structures, tends to present higher computational cost than the execution of traditional ones. Two strategies can be applied to speed up the execution of any query, and thus they are worth to employ to answer also similarity queries. The first strategy is query rewriting based on algebraic properties and cost functions. The second technique is when external query factors are applied, such as employing the semantic expected by the user, to prune the answer space. This thesis aims at contributing to the development of novel techniques to improve the similarity-based query optimization processing, exploiting both algebraic properties and semantic restrictions as query refinementsA complexidade dos dados armazenados em grandes bases de dados tem aumentado sempre, criando a necessidade de novas operações de consulta. Uma classe de operações de crescente interesse são as consultas por similaridade, das quais as mais conhecidas são as consultas por abrangência (\'R IND. q\') e por k-vizinhos mais próximos (\'kNN IND. q\'). Qualquer consulta e agilizada pelas estruturas de indexação dos Sistemas de Gerenciamento de Bases de Dados (SGBDs). Outro modo de agilizar as operações de busca e a manutenção de métricas sobre os dados, que são utilizadas para ajustar parâmetros dos algoritmos de busca em cada consulta, num processo conhecido como otimização de consultas. Como as buscas por similaridade começaram a ser estudadas seriamente para integração em SGBDs muito mais recentemente do que as buscas tradicionais, a otimização de consultas, por enquanto, e um recurso que tem sido utilizado para responder apenas a consultas tradicionais. Mesmo utilizando as melhores estruturas existentes, a execução de consultas por similaridade tende a ser mais custosa do que as operações tradicionais. Assim, duas estratégias podem ser utilizadas para agilizar a execução de qualquer consulta e, assim, podem ser empregadas também para responder às consultas por similaridade. A primeira estratégia e a reescrita de consultas baseada em propriedades algébricas e em funções de custo. A segunda técnica faz uso de fatores externos à consulta, tais como a semântica esperada pelo usuário, para restringir o espaço das respostas. Esta tese pretende contribuir para o desenvolvimento de técnicas que melhorem o processo de otimização de consultas por similaridade, explorando propriedades algebricas e restrições semânticas como refinamento de consultasBiblioteca Digitais de Teses e Dissertações da USPTraina Junior, CaetanoFerreira, Mônica Ribeiro Porto2012-10-22info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfhttp://www.teses.usp.br/teses/disponiveis/55/55134/tde-24012013-091242/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2016-07-28T16:10:32Zoai:teses.usp.br:tde-24012013-091242Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212016-07-28T16:10:32Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv Optimizing similarity queries in metric spaces meeting user\'s expectation
Otimização de operações de busca por similaridade em espaços métricos
title Optimizing similarity queries in metric spaces meeting user\'s expectation
spellingShingle Optimizing similarity queries in metric spaces meeting user\'s expectation
Ferreira, Mônica Ribeiro Porto
Álgebra por similaridade
Consultas por similaridade
Espaços métricos
Expectativa do usuário
Metric spaces
Otimização de consultas por similaridade
Similarity algebra
Similarity queries
Similarity query optimization
User's expectation
title_short Optimizing similarity queries in metric spaces meeting user\'s expectation
title_full Optimizing similarity queries in metric spaces meeting user\'s expectation
title_fullStr Optimizing similarity queries in metric spaces meeting user\'s expectation
title_full_unstemmed Optimizing similarity queries in metric spaces meeting user\'s expectation
title_sort Optimizing similarity queries in metric spaces meeting user\'s expectation
author Ferreira, Mônica Ribeiro Porto
author_facet Ferreira, Mônica Ribeiro Porto
author_role author
dc.contributor.none.fl_str_mv Traina Junior, Caetano
dc.contributor.author.fl_str_mv Ferreira, Mônica Ribeiro Porto
dc.subject.por.fl_str_mv Álgebra por similaridade
Consultas por similaridade
Espaços métricos
Expectativa do usuário
Metric spaces
Otimização de consultas por similaridade
Similarity algebra
Similarity queries
Similarity query optimization
User's expectation
topic Álgebra por similaridade
Consultas por similaridade
Espaços métricos
Expectativa do usuário
Metric spaces
Otimização de consultas por similaridade
Similarity algebra
Similarity queries
Similarity query optimization
User's expectation
description The complexity of data stored in large databases has increased at very fast paces. Hence, operations more elaborated than traditional queries are essential in order to extract all required information from the database. Therefore, the interest of the database community in similarity search has increased significantly. Two of the well-known types of similarity search are the Range (\'R IND. q\') and the k-Nearest Neighbor (\'kNN IND. q\') queries, which, as any of the traditional ones, can be sped up by indexing structures of the Database Management System (DBMS). Another way of speeding up queries is to perform query optimization. In this process, metrics about data are collected and employed to adjust the parameters of the search algorithms in each query execution. However, although the integration of similarity search into DBMS has begun to be deeply studied more recently, the query optimization has been developed and employed just to answer traditional queries. The execution of similarity queries, even using efficient indexing structures, tends to present higher computational cost than the execution of traditional ones. Two strategies can be applied to speed up the execution of any query, and thus they are worth to employ to answer also similarity queries. The first strategy is query rewriting based on algebraic properties and cost functions. The second technique is when external query factors are applied, such as employing the semantic expected by the user, to prune the answer space. This thesis aims at contributing to the development of novel techniques to improve the similarity-based query optimization processing, exploiting both algebraic properties and semantic restrictions as query refinements
publishDate 2012
dc.date.none.fl_str_mv 2012-10-22
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://www.teses.usp.br/teses/disponiveis/55/55134/tde-24012013-091242/
url http://www.teses.usp.br/teses/disponiveis/55/55134/tde-24012013-091242/
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv
dc.rights.driver.fl_str_mv Liberar o conteúdo para acesso público.
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Liberar o conteúdo para acesso público.
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.coverage.none.fl_str_mv
dc.publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
dc.source.none.fl_str_mv
reponame:Biblioteca Digital de Teses e Dissertações da USP
instname:Universidade de São Paulo (USP)
instacron:USP
instname_str Universidade de São Paulo (USP)
instacron_str USP
institution USP
reponame_str Biblioteca Digital de Teses e Dissertações da USP
collection Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br
_version_ 1815257380937531392