Large scale similarity-based time series mining

Detalhes bibliográficos
Autor(a) principal: Silva, Diego Furtado
Data de Publicação: 2017
Tipo de documento: Tese
Idioma: eng
Título da fonte: Biblioteca Digital de Teses e Dissertações da USP
Texto Completo: http://www.teses.usp.br/teses/disponiveis/55/55134/tde-07122017-161346/
Resumo: Time series are ubiquitous in the day-by-day of human beings. A diversity of application domains generate data arranged in time, such as medicine, biology, economics, and signal processing. Due to the great interest in time series, a large variety of methods for mining temporal data has been proposed in recent decades. Several of these methods have one characteristic in common: in their cores, there is a (dis)similarity function used to compare the time series. Dynamic Time Warping (DTW) is arguably the most relevant, studied and applied distance measure for time series analysis. The main drawback of DTW is its computational complexity. At the same time, there are a significant number of data mining tasks, such as motif discovery, which requires a quadratic number of distance computations. These tasks are time intensive even for less expensive distance measures, like the Euclidean Distance. This thesis focus on developing fast algorithms that allow large-scale analysis of temporal data, using similarity-based methods for time series data mining. The contributions of this work have implications in several data mining tasks, such as classification, clustering and motif discovery. Specifically, the main contributions of this thesis are the following: (i) an algorithm to speed up the exact DTW calculation and its embedding into the similarity search procedure; (ii) a novel DTW-based spurious prefix and suffix invariant distance; (iii) a music similarity representation with implications on several music mining tasks, and a fast algorithm to compute it, and; (iv) an efficient and anytime method to find motifs and discords under the proposed prefix and suffix invariant DTW.
id USP_88b7450035c040f7f09333d0fd97716c
oai_identifier_str oai:teses.usp.br:tde-07122017-161346
network_acronym_str USP
network_name_str Biblioteca Digital de Teses e Dissertações da USP
repository_id_str 2721
spelling Large scale similarity-based time series miningMineração de séries temporais por similaridade em larga escalaData miningDynamic Time WarpingDynamic Time WarpingMedidas de similaridadeMineração de dadosSéries temporaisSimilarity measuresTime seriesTime series are ubiquitous in the day-by-day of human beings. A diversity of application domains generate data arranged in time, such as medicine, biology, economics, and signal processing. Due to the great interest in time series, a large variety of methods for mining temporal data has been proposed in recent decades. Several of these methods have one characteristic in common: in their cores, there is a (dis)similarity function used to compare the time series. Dynamic Time Warping (DTW) is arguably the most relevant, studied and applied distance measure for time series analysis. The main drawback of DTW is its computational complexity. At the same time, there are a significant number of data mining tasks, such as motif discovery, which requires a quadratic number of distance computations. These tasks are time intensive even for less expensive distance measures, like the Euclidean Distance. This thesis focus on developing fast algorithms that allow large-scale analysis of temporal data, using similarity-based methods for time series data mining. The contributions of this work have implications in several data mining tasks, such as classification, clustering and motif discovery. Specifically, the main contributions of this thesis are the following: (i) an algorithm to speed up the exact DTW calculation and its embedding into the similarity search procedure; (ii) a novel DTW-based spurious prefix and suffix invariant distance; (iii) a music similarity representation with implications on several music mining tasks, and a fast algorithm to compute it, and; (iv) an efficient and anytime method to find motifs and discords under the proposed prefix and suffix invariant DTW.Séries temporais são ubíquas no dia-a-dia do ser humano. Dados organizados no tempo são gerados em uma infinidade de domínios de aplicação, como medicina, biologia, economia e processamento de sinais. Devido ao grande interesse nesse tipo de dados, diversos métodos de mineração de dados temporais foram propostos nas últimas décadas. Muitos desses métodos possuem uma característica em comum: em seu núcleo, há uma função de (dis)similaridade utilizada para comparar as séries. Dynamic Time Warping (DTW) é indiscutivelmente a medida de distância mais relevante na análise de séries temporais. A principal dificuldade em se utilizar a DTW é seu alto custo computacional. Ao mesmo tempo, algumas tarefas de mineração de séries temporais, como descoberta de motifs, requerem um alto número de cálculos de distância. Essas tarefas despendem um grande tempo de execução, mesmo utilizando-se medidas de distância menos custosas, como a distância Euclidiana. Esta tese se concentra no desenvolvimento de algoritmos eficientes que permitem a análise de dados temporais em larga escala, utilizando métodos baseados em similaridade. As contribuições desta tese têm implicações em variadas tarefas de mineração de dados, como classificação, agrupamento e descoberta de padrões frequentes. Especificamente, as principais contribuições desta tese são: (i) um algoritmo para acelerar o cálculo exato da distância DTW e sua incorporação ao processo de busca por similaridade; (ii) um novo algoritmo baseado em DTW para prover invariância a prefixos e sufixos espúrios no cálculo da distância; (iii) uma representação de similaridade musical com implicações em diferentes tarefas de mineração de dados musicais e um algoritmo eficiente para computá-la; (iv) um método eficiente e anytime para encontrar motifs e discords baseado na medida DTW invariante a prefixos e sufixos.Biblioteca Digitais de Teses e Dissertações da USPBatista, Gustavo Enrique de Almeida Prado AlvesKeogh, Eamonn JohnSilva, Diego Furtado2017-09-25info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfhttp://www.teses.usp.br/teses/disponiveis/55/55134/tde-07122017-161346/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2018-07-17T16:38:18Zoai:teses.usp.br:tde-07122017-161346Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212018-07-17T16:38:18Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv Large scale similarity-based time series mining
Mineração de séries temporais por similaridade em larga escala
title Large scale similarity-based time series mining
spellingShingle Large scale similarity-based time series mining
Silva, Diego Furtado
Data mining
Dynamic Time Warping
Dynamic Time Warping
Medidas de similaridade
Mineração de dados
Séries temporais
Similarity measures
Time series
title_short Large scale similarity-based time series mining
title_full Large scale similarity-based time series mining
title_fullStr Large scale similarity-based time series mining
title_full_unstemmed Large scale similarity-based time series mining
title_sort Large scale similarity-based time series mining
author Silva, Diego Furtado
author_facet Silva, Diego Furtado
author_role author
dc.contributor.none.fl_str_mv Batista, Gustavo Enrique de Almeida Prado Alves
Keogh, Eamonn John
dc.contributor.author.fl_str_mv Silva, Diego Furtado
dc.subject.por.fl_str_mv Data mining
Dynamic Time Warping
Dynamic Time Warping
Medidas de similaridade
Mineração de dados
Séries temporais
Similarity measures
Time series
topic Data mining
Dynamic Time Warping
Dynamic Time Warping
Medidas de similaridade
Mineração de dados
Séries temporais
Similarity measures
Time series
description Time series are ubiquitous in the day-by-day of human beings. A diversity of application domains generate data arranged in time, such as medicine, biology, economics, and signal processing. Due to the great interest in time series, a large variety of methods for mining temporal data has been proposed in recent decades. Several of these methods have one characteristic in common: in their cores, there is a (dis)similarity function used to compare the time series. Dynamic Time Warping (DTW) is arguably the most relevant, studied and applied distance measure for time series analysis. The main drawback of DTW is its computational complexity. At the same time, there are a significant number of data mining tasks, such as motif discovery, which requires a quadratic number of distance computations. These tasks are time intensive even for less expensive distance measures, like the Euclidean Distance. This thesis focus on developing fast algorithms that allow large-scale analysis of temporal data, using similarity-based methods for time series data mining. The contributions of this work have implications in several data mining tasks, such as classification, clustering and motif discovery. Specifically, the main contributions of this thesis are the following: (i) an algorithm to speed up the exact DTW calculation and its embedding into the similarity search procedure; (ii) a novel DTW-based spurious prefix and suffix invariant distance; (iii) a music similarity representation with implications on several music mining tasks, and a fast algorithm to compute it, and; (iv) an efficient and anytime method to find motifs and discords under the proposed prefix and suffix invariant DTW.
publishDate 2017
dc.date.none.fl_str_mv 2017-09-25
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://www.teses.usp.br/teses/disponiveis/55/55134/tde-07122017-161346/
url http://www.teses.usp.br/teses/disponiveis/55/55134/tde-07122017-161346/
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv
dc.rights.driver.fl_str_mv Liberar o conteúdo para acesso público.
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Liberar o conteúdo para acesso público.
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.coverage.none.fl_str_mv
dc.publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
dc.source.none.fl_str_mv
reponame:Biblioteca Digital de Teses e Dissertações da USP
instname:Universidade de São Paulo (USP)
instacron:USP
instname_str Universidade de São Paulo (USP)
instacron_str USP
institution USP
reponame_str Biblioteca Digital de Teses e Dissertações da USP
collection Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br
_version_ 1815256773475434496