Hierarchical Clustering of Time-Series Data Streams
Autor(a) principal: | |
---|---|
Data de Publicação: | 2008 |
Outros Autores: | , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://repositorio.inesctec.pt/handle/123456789/3100 |
Resumo: | This paper presents and analyzes an incremental system for clustering streaming time series. The Online Divisive-Agglomerative Clustering (ODAC) system continuously maintains a tree-like hierarchy of clusters that evolves with data. ODAC uses a top-down strategy. The splitting criterion is a correlation-based dissimilarity measure among time series, splitting each node by the farthest pair of streams, which defines the diameter of the cluster. In stationary environments expanding the structure leads to a decrease in the diameters of the clusters. The system uses a merge operator, which agglomerates two sibling clusters, in order to react to changes in the correlation structure between time series. The split and merge operators are triggered in response to changes in the diameters of existing clusters. The system is designed to process thousands of data streams that flow at high-rate. The main features of the system include update time and memory consumption that do not depend on the number of examples in the stream. Moreover, the time and memory required to process an example decreases whenever the cluster structure expands. Experimental results on artificial and real data assess the processing qualities of the system, suggesting competitive performance on clustering streaming time series, exploring also its ability to deal with concept drift. |
id |
RCAP_e59722708c6fd022a5430e8f0ce5be17 |
---|---|
oai_identifier_str |
oai:repositorio.inesctec.pt:123456789/3100 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Hierarchical Clustering of Time-Series Data StreamsThis paper presents and analyzes an incremental system for clustering streaming time series. The Online Divisive-Agglomerative Clustering (ODAC) system continuously maintains a tree-like hierarchy of clusters that evolves with data. ODAC uses a top-down strategy. The splitting criterion is a correlation-based dissimilarity measure among time series, splitting each node by the farthest pair of streams, which defines the diameter of the cluster. In stationary environments expanding the structure leads to a decrease in the diameters of the clusters. The system uses a merge operator, which agglomerates two sibling clusters, in order to react to changes in the correlation structure between time series. The split and merge operators are triggered in response to changes in the diameters of existing clusters. The system is designed to process thousands of data streams that flow at high-rate. The main features of the system include update time and memory consumption that do not depend on the number of examples in the stream. Moreover, the time and memory required to process an example decreases whenever the cluster structure expands. Experimental results on artificial and real data assess the processing qualities of the system, suggesting competitive performance on clustering streaming time series, exploring also its ability to deal with concept drift.2017-11-17T11:36:34Z2008-01-01T00:00:00Z2008info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://repositorio.inesctec.pt/handle/123456789/3100engJoão Pedro PedrosoPedro Pereira RodriguesJoão Gamainfo:eu-repo/semantics/embargoedAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-05-15T10:19:43Zoai:repositorio.inesctec.pt:123456789/3100Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T17:52:08.330573Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Hierarchical Clustering of Time-Series Data Streams |
title |
Hierarchical Clustering of Time-Series Data Streams |
spellingShingle |
Hierarchical Clustering of Time-Series Data Streams João Pedro Pedroso |
title_short |
Hierarchical Clustering of Time-Series Data Streams |
title_full |
Hierarchical Clustering of Time-Series Data Streams |
title_fullStr |
Hierarchical Clustering of Time-Series Data Streams |
title_full_unstemmed |
Hierarchical Clustering of Time-Series Data Streams |
title_sort |
Hierarchical Clustering of Time-Series Data Streams |
author |
João Pedro Pedroso |
author_facet |
João Pedro Pedroso Pedro Pereira Rodrigues João Gama |
author_role |
author |
author2 |
Pedro Pereira Rodrigues João Gama |
author2_role |
author author |
dc.contributor.author.fl_str_mv |
João Pedro Pedroso Pedro Pereira Rodrigues João Gama |
description |
This paper presents and analyzes an incremental system for clustering streaming time series. The Online Divisive-Agglomerative Clustering (ODAC) system continuously maintains a tree-like hierarchy of clusters that evolves with data. ODAC uses a top-down strategy. The splitting criterion is a correlation-based dissimilarity measure among time series, splitting each node by the farthest pair of streams, which defines the diameter of the cluster. In stationary environments expanding the structure leads to a decrease in the diameters of the clusters. The system uses a merge operator, which agglomerates two sibling clusters, in order to react to changes in the correlation structure between time series. The split and merge operators are triggered in response to changes in the diameters of existing clusters. The system is designed to process thousands of data streams that flow at high-rate. The main features of the system include update time and memory consumption that do not depend on the number of examples in the stream. Moreover, the time and memory required to process an example decreases whenever the cluster structure expands. Experimental results on artificial and real data assess the processing qualities of the system, suggesting competitive performance on clustering streaming time series, exploring also its ability to deal with concept drift. |
publishDate |
2008 |
dc.date.none.fl_str_mv |
2008-01-01T00:00:00Z 2008 2017-11-17T11:36:34Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://repositorio.inesctec.pt/handle/123456789/3100 |
url |
http://repositorio.inesctec.pt/handle/123456789/3100 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/embargoedAccess |
eu_rights_str_mv |
embargoedAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799131598528970752 |