Estudo de Técnicas para Indexação e Recuperação de Sequências Numéricas: Segmentação Adaptativa e Processamento de Consultas em Lote

Brito, Luiz Fernando Afra

Estudo de Técnicas para Indexação e Recuperação de Sequências Numéricas: Segmentação Adaptativa e Processamento de Consultas em Lote

Detalhes bibliográficos
Autor(a) principal:	Brito, Luiz Fernando Afra
Data de Publicação:	2018
Tipo de documento:	Dissertação
Idioma:	por
Título da fonte:	Repositório Institucional da UFU
Texto Completo:	https://repositorio.ufu.br/handle/123456789/21300 http://dx.doi.org/10.14393/ufu.di.2018.253
Resumo:	Indexing structures and specialized search algorithms provide similarity queries. According to current literature, similarity queries should be fast and minimize the amount of space required. In this master’s thesis, we studied two approaches in order to meet these requirements in the context of numeric sequences. In the first approach, we proposed two representations to approximate sequences and to create lower bounding measures to the Euclidian distance: Error-Bounded Piecewise Linear Approximation (EBPLA) and Adaptive Indexable Piecewise Linear Approximation (AIPLA). In an innovative way, these two representations stored a set of coefficients such that its size was proportionally to the characteristics of the sequences. In experiments, the EBPLA, although flexible, obtained high approximation error and, consequently, the efficiency of its lower bounding was lower than the other representations. The other proposed representation, the AIPLA, provided the lowest approximation error and its lower bounding was similar to well known representations such as Piecewise Aggregate Approximation (PAA) and Indexable Piecewise Linear Approximation (IPLA). In the second approach we grouped query sequences, sent as batches, in order to reduce the time of similarity queries. Firstly we formed groups of queries and then we searched through indexing structures, such as R-Trees and M-Trees, only once. In our experiments, we evaluated 5 different strategies to group sequences. The results indicate the overall best strategy for grouping queries, the one which saved more access to secondary memory, is the one that unifies all queries in a single group. However, this grouping strategy can considerably increase the usage of primary memory for large batches. Therefore, in scenarios where primary memory is limited, we suggest the use of the strategy which creates N clusters from N initial sequences chosen randomly.

Metadados do item

id	UFU_a43c9955623c7afc5c2915a6d5c1cd0d
oai_identifier_str	oai:repositorio.ufu.br:123456789/21300
network_acronym_str	UFU
network_name_str	Repositório Institucional da UFU
repository_id_str
spelling	Estudo de Técnicas para Indexação e Recuperação de Sequências Numéricas: Segmentação Adaptativa e Processamento de Consultas em LoteStudy of Techniques for Indexing and Retrieval of Numerical Sequences: Adaptive Segmentation and Batch-mode Similarity QueryAgrupamentoClusteringBusca em loteSequênciaConsulta por similaridadeRedução de dimensionalidadeIndexaçãoLower boundingBatch-mode searchDimensionality reductionSimilarity queryIndexingSequenceCNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::METODOLOGIA E TECNICAS DA COMPUTACAO::BANCO DE DADOSIndexing structures and specialized search algorithms provide similarity queries. According to current literature, similarity queries should be fast and minimize the amount of space required. In this master’s thesis, we studied two approaches in order to meet these requirements in the context of numeric sequences. In the first approach, we proposed two representations to approximate sequences and to create lower bounding measures to the Euclidian distance: Error-Bounded Piecewise Linear Approximation (EBPLA) and Adaptive Indexable Piecewise Linear Approximation (AIPLA). In an innovative way, these two representations stored a set of coefficients such that its size was proportionally to the characteristics of the sequences. In experiments, the EBPLA, although flexible, obtained high approximation error and, consequently, the efficiency of its lower bounding was lower than the other representations. The other proposed representation, the AIPLA, provided the lowest approximation error and its lower bounding was similar to well known representations such as Piecewise Aggregate Approximation (PAA) and Indexable Piecewise Linear Approximation (IPLA). In the second approach we grouped query sequences, sent as batches, in order to reduce the time of similarity queries. Firstly we formed groups of queries and then we searched through indexing structures, such as R-Trees and M-Trees, only once. In our experiments, we evaluated 5 different strategies to group sequences. The results indicate the overall best strategy for grouping queries, the one which saved more access to secondary memory, is the one that unifies all queries in a single group. However, this grouping strategy can considerably increase the usage of primary memory for large batches. Therefore, in scenarios where primary memory is limited, we suggest the use of the strategy which creates N clusters from N initial sequences chosen randomly.CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível SuperiorDissertação (Mestrado)Estruturas de indexação e algoritmos especializados de busca provêm consultas por similaridade. De acordo com a literatura atual, consultas por similaridade devem ser rápidas e utilizar o mínimo de espaço possível. Nesta dissertação foram estudadas abordagens para atender a esses requisitos no contexto de sequências numéricas. Na primeira abordagem foram propostas duas representações reduzidas das sequências para a criação de medidas lower bounding da distância euclidiana, sendo elas: Error-Bounded Piecewise Linear Approximation (EBPLA) e Adaptive Indexable Piecewise Linear Approximation (AIPLA). De modo inovador, essas duas propostas armazenaram um conjunto de coeficientes de tamanho adaptável às características das sequências. Em experimentos, a representação EBPLA, apesar de flexível, obteve erro de aproximação alto e, consequentemente, a eficiência de sua medida lower bounding foi inferior as outras representações. A outra proposta, AIPLA, proporcionou menores erros de aproximação e sua medida lower bounding foi comparável ás criadas a partir de representações tradicionais como Piecewise Aggregate Approximation (PAA) e Indexable Piecewise Linear Approximation (IPLA). A segunda abordagem teve como objetivo reduzir o tempo de consultas por meio do agrupamento de sequências de consulta enviadas em lote. Primeiramente formaram-se grupos de consultas para que, posteriormente, apenas uma varredura por grupo em R-Trees e M-Trees foi realizada. Ao todo foram avaliadas 5 estratégias para agrupar as consultas. Os resultados observados indicam que a estratégia que economiza mais acessos a memória secundária é aquela que cria um único grupo contendo todas as sequências de consulta. Entretanto, dependendo do tamanho do lote de consultas, a necessidade de espaço em memória principal pode aumentar consideravelmente ao utilizar essa estratégia. Por isso, em casos onde a quantidade de memória principal é limitada, sugere-se o uso da estratégia que cria N grupos a partir de N sequências de consultas escolhidas aleatoriamente.Universidade Federal de UberlândiaBrasilPrograma de Pós-graduação em Ciência da ComputaçãoAlbertini, Marcelo KeeseRazente, Humberto LuizRios, Ricardo AraújoBrito, Luiz Fernando Afra2018-05-08T17:48:44Z2018-05-08T17:48:44Z2018-03-08info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfBRITO, Luiz Fernando Afra. Estudo de Técnicas para Indexação e Recuperação de Sequências Numéricas: Segmentação Adaptativa e Processamento de Consultas em Lote - Uberlândia. 2018. 107 f. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal de Uberlândia, Uberlândia, 2018https://repositorio.ufu.br/handle/123456789/21300http://dx.doi.org/10.14393/ufu.di.2018.253porinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFUinstname:Universidade Federal de Uberlândia (UFU)instacron:UFU2018-05-08T17:48:44Zoai:repositorio.ufu.br:123456789/21300Repositório InstitucionalONGhttp://repositorio.ufu.br/oai/requestdiinf@dirbi.ufu.bropendoar:2018-05-08T17:48:44Repositório Institucional da UFU - Universidade Federal de Uberlândia (UFU)false
dc.title.none.fl_str_mv	Estudo de Técnicas para Indexação e Recuperação de Sequências Numéricas: Segmentação Adaptativa e Processamento de Consultas em Lote Study of Techniques for Indexing and Retrieval of Numerical Sequences: Adaptive Segmentation and Batch-mode Similarity Query
title	Estudo de Técnicas para Indexação e Recuperação de Sequências Numéricas: Segmentação Adaptativa e Processamento de Consultas em Lote
spellingShingle	Estudo de Técnicas para Indexação e Recuperação de Sequências Numéricas: Segmentação Adaptativa e Processamento de Consultas em Lote Brito, Luiz Fernando Afra Agrupamento Clustering Busca em lote Sequência Consulta por similaridade Redução de dimensionalidade Indexação Lower bounding Batch-mode search Dimensionality reduction Similarity query Indexing Sequence CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::METODOLOGIA E TECNICAS DA COMPUTACAO::BANCO DE DADOS
title_short	Estudo de Técnicas para Indexação e Recuperação de Sequências Numéricas: Segmentação Adaptativa e Processamento de Consultas em Lote
title_full	Estudo de Técnicas para Indexação e Recuperação de Sequências Numéricas: Segmentação Adaptativa e Processamento de Consultas em Lote
title_fullStr	Estudo de Técnicas para Indexação e Recuperação de Sequências Numéricas: Segmentação Adaptativa e Processamento de Consultas em Lote
title_full_unstemmed	Estudo de Técnicas para Indexação e Recuperação de Sequências Numéricas: Segmentação Adaptativa e Processamento de Consultas em Lote
title_sort	Estudo de Técnicas para Indexação e Recuperação de Sequências Numéricas: Segmentação Adaptativa e Processamento de Consultas em Lote
author	Brito, Luiz Fernando Afra
author_facet	Brito, Luiz Fernando Afra
author_role	author
dc.contributor.none.fl_str_mv	Albertini, Marcelo Keese Razente, Humberto Luiz Rios, Ricardo Araújo
dc.contributor.author.fl_str_mv	Brito, Luiz Fernando Afra
dc.subject.por.fl_str_mv	Agrupamento Clustering Busca em lote Sequência Consulta por similaridade Redução de dimensionalidade Indexação Lower bounding Batch-mode search Dimensionality reduction Similarity query Indexing Sequence CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::METODOLOGIA E TECNICAS DA COMPUTACAO::BANCO DE DADOS
topic	Agrupamento Clustering Busca em lote Sequência Consulta por similaridade Redução de dimensionalidade Indexação Lower bounding Batch-mode search Dimensionality reduction Similarity query Indexing Sequence CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::METODOLOGIA E TECNICAS DA COMPUTACAO::BANCO DE DADOS
description	Indexing structures and specialized search algorithms provide similarity queries. According to current literature, similarity queries should be fast and minimize the amount of space required. In this master’s thesis, we studied two approaches in order to meet these requirements in the context of numeric sequences. In the first approach, we proposed two representations to approximate sequences and to create lower bounding measures to the Euclidian distance: Error-Bounded Piecewise Linear Approximation (EBPLA) and Adaptive Indexable Piecewise Linear Approximation (AIPLA). In an innovative way, these two representations stored a set of coefficients such that its size was proportionally to the characteristics of the sequences. In experiments, the EBPLA, although flexible, obtained high approximation error and, consequently, the efficiency of its lower bounding was lower than the other representations. The other proposed representation, the AIPLA, provided the lowest approximation error and its lower bounding was similar to well known representations such as Piecewise Aggregate Approximation (PAA) and Indexable Piecewise Linear Approximation (IPLA). In the second approach we grouped query sequences, sent as batches, in order to reduce the time of similarity queries. Firstly we formed groups of queries and then we searched through indexing structures, such as R-Trees and M-Trees, only once. In our experiments, we evaluated 5 different strategies to group sequences. The results indicate the overall best strategy for grouping queries, the one which saved more access to secondary memory, is the one that unifies all queries in a single group. However, this grouping strategy can considerably increase the usage of primary memory for large batches. Therefore, in scenarios where primary memory is limited, we suggest the use of the strategy which creates N clusters from N initial sequences chosen randomly.
publishDate	2018
dc.date.none.fl_str_mv	2018-05-08T17:48:44Z 2018-05-08T17:48:44Z 2018-03-08
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	BRITO, Luiz Fernando Afra. Estudo de Técnicas para Indexação e Recuperação de Sequências Numéricas: Segmentação Adaptativa e Processamento de Consultas em Lote - Uberlândia. 2018. 107 f. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal de Uberlândia, Uberlândia, 2018 https://repositorio.ufu.br/handle/123456789/21300 http://dx.doi.org/10.14393/ufu.di.2018.253
identifier_str_mv	BRITO, Luiz Fernando Afra. Estudo de Técnicas para Indexação e Recuperação de Sequências Numéricas: Segmentação Adaptativa e Processamento de Consultas em Lote - Uberlândia. 2018. 107 f. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal de Uberlândia, Uberlândia, 2018
url	https://repositorio.ufu.br/handle/123456789/21300 http://dx.doi.org/10.14393/ufu.di.2018.253
dc.language.iso.fl_str_mv	por
language	por
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Universidade Federal de Uberlândia Brasil Programa de Pós-graduação em Ciência da Computação
publisher.none.fl_str_mv	Universidade Federal de Uberlândia Brasil Programa de Pós-graduação em Ciência da Computação
dc.source.none.fl_str_mv	reponame:Repositório Institucional da UFU instname:Universidade Federal de Uberlândia (UFU) instacron:UFU
instname_str	Universidade Federal de Uberlândia (UFU)
instacron_str	UFU
institution	UFU
reponame_str	Repositório Institucional da UFU
collection	Repositório Institucional da UFU
repository.name.fl_str_mv	Repositório Institucional da UFU - Universidade Federal de Uberlândia (UFU)
repository.mail.fl_str_mv	diinf@dirbi.ufu.br
_version_	1805569680658661376

Estudo de Técnicas para Indexação e Recuperação de Sequências Numéricas: Segmentação Adaptativa e Processamento de Consultas em Lote

Registros relacionados