Hierarchical Clustering of Time-Series Data Streams

Detalhes bibliográficos
Autor(a) principal: João Pedro Pedroso
Data de Publicação: 2008
Outros Autores: Pedro Pereira Rodrigues, João Gama
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://repositorio.inesctec.pt/handle/123456789/3100
Resumo: This paper presents and analyzes an incremental system for clustering streaming time series. The Online Divisive-Agglomerative Clustering (ODAC) system continuously maintains a tree-like hierarchy of clusters that evolves with data. ODAC uses a top-down strategy. The splitting criterion is a correlation-based dissimilarity measure among time series, splitting each node by the farthest pair of streams, which defines the diameter of the cluster. In stationary environments expanding the structure leads to a decrease in the diameters of the clusters. The system uses a merge operator, which agglomerates two sibling clusters, in order to react to changes in the correlation structure between time series. The split and merge operators are triggered in response to changes in the diameters of existing clusters. The system is designed to process thousands of data streams that flow at high-rate. The main features of the system include update time and memory consumption that do not depend on the number of examples in the stream. Moreover, the time and memory required to process an example decreases whenever the cluster structure expands. Experimental results on artificial and real data assess the processing qualities of the system, suggesting competitive performance on clustering streaming time series, exploring also its ability to deal with concept drift.
id RCAP_e59722708c6fd022a5430e8f0ce5be17
oai_identifier_str oai:repositorio.inesctec.pt:123456789/3100
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Hierarchical Clustering of Time-Series Data StreamsThis paper presents and analyzes an incremental system for clustering streaming time series. The Online Divisive-Agglomerative Clustering (ODAC) system continuously maintains a tree-like hierarchy of clusters that evolves with data. ODAC uses a top-down strategy. The splitting criterion is a correlation-based dissimilarity measure among time series, splitting each node by the farthest pair of streams, which defines the diameter of the cluster. In stationary environments expanding the structure leads to a decrease in the diameters of the clusters. The system uses a merge operator, which agglomerates two sibling clusters, in order to react to changes in the correlation structure between time series. The split and merge operators are triggered in response to changes in the diameters of existing clusters. The system is designed to process thousands of data streams that flow at high-rate. The main features of the system include update time and memory consumption that do not depend on the number of examples in the stream. Moreover, the time and memory required to process an example decreases whenever the cluster structure expands. Experimental results on artificial and real data assess the processing qualities of the system, suggesting competitive performance on clustering streaming time series, exploring also its ability to deal with concept drift.2017-11-17T11:36:34Z2008-01-01T00:00:00Z2008info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://repositorio.inesctec.pt/handle/123456789/3100engJoão Pedro PedrosoPedro Pereira RodriguesJoão Gamainfo:eu-repo/semantics/embargoedAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-05-15T10:19:43Zoai:repositorio.inesctec.pt:123456789/3100Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T17:52:08.330573Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Hierarchical Clustering of Time-Series Data Streams
title Hierarchical Clustering of Time-Series Data Streams
spellingShingle Hierarchical Clustering of Time-Series Data Streams
João Pedro Pedroso
title_short Hierarchical Clustering of Time-Series Data Streams
title_full Hierarchical Clustering of Time-Series Data Streams
title_fullStr Hierarchical Clustering of Time-Series Data Streams
title_full_unstemmed Hierarchical Clustering of Time-Series Data Streams
title_sort Hierarchical Clustering of Time-Series Data Streams
author João Pedro Pedroso
author_facet João Pedro Pedroso
Pedro Pereira Rodrigues
João Gama
author_role author
author2 Pedro Pereira Rodrigues
João Gama
author2_role author
author
dc.contributor.author.fl_str_mv João Pedro Pedroso
Pedro Pereira Rodrigues
João Gama
description This paper presents and analyzes an incremental system for clustering streaming time series. The Online Divisive-Agglomerative Clustering (ODAC) system continuously maintains a tree-like hierarchy of clusters that evolves with data. ODAC uses a top-down strategy. The splitting criterion is a correlation-based dissimilarity measure among time series, splitting each node by the farthest pair of streams, which defines the diameter of the cluster. In stationary environments expanding the structure leads to a decrease in the diameters of the clusters. The system uses a merge operator, which agglomerates two sibling clusters, in order to react to changes in the correlation structure between time series. The split and merge operators are triggered in response to changes in the diameters of existing clusters. The system is designed to process thousands of data streams that flow at high-rate. The main features of the system include update time and memory consumption that do not depend on the number of examples in the stream. Moreover, the time and memory required to process an example decreases whenever the cluster structure expands. Experimental results on artificial and real data assess the processing qualities of the system, suggesting competitive performance on clustering streaming time series, exploring also its ability to deal with concept drift.
publishDate 2008
dc.date.none.fl_str_mv 2008-01-01T00:00:00Z
2008
2017-11-17T11:36:34Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://repositorio.inesctec.pt/handle/123456789/3100
url http://repositorio.inesctec.pt/handle/123456789/3100
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/embargoedAccess
eu_rights_str_mv embargoedAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799131598528970752