An evolutionary algorithm for clustering data streams with a variable number of clusters

Detalhes bibliográficos
Autor(a) principal: Silva,JD
Data de Publicação: 2017
Outros Autores: Hruschka,ER, João Gama
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://repositorio.inesctec.pt/handle/123456789/5342
http://dx.doi.org/10.1016/j.eswa.2016.09.020
Resumo: Several algorithms for clustering data streams based on k-Means have been proposed in the literature. However, most of them assume that the number of clusters, k, is known a priori by the user and can be kept fixed throughout the data analySis process. Besides the difficulty in choosing k, data stream clustering imposes several challenges to be addressed, such as addressing non-stationary, unbounded data that arrive in an online fashion. In this paper, we propose a Fast Evolutionary Algorithm for Clustering data streams (FEAC-Stream) that allows estimating k automatically from data in an online fashion. FEAC-Stream uses the Page-Hinkley Test to detect eventual degradation in the quality of the induced clusters, thereby triggering an evolutionary algorithm that re-estimates k accordingly. FEAC-Stream relies on the assumption that clusters of (partially unknown) data can provide useful information about the dynamics of the data stream. We illustrate the potential of FEAC-Stream in a set of experiments using both synthetic and real-world data streams, comparing it to four related algorithms, namely: CluStream-OMRk, CluStream-BkM, StreamKM++-OMRk and StreamKM++-BkM. The obtained results show that FEAC-Stream provides good data partitions and that it can detect, and accordingly react to, data changes.
id RCAP_737b72adcfc0b34f83b956222c6930fd
oai_identifier_str oai:repositorio.inesctec.pt:123456789/5342
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling An evolutionary algorithm for clustering data streams with a variable number of clustersSeveral algorithms for clustering data streams based on k-Means have been proposed in the literature. However, most of them assume that the number of clusters, k, is known a priori by the user and can be kept fixed throughout the data analySis process. Besides the difficulty in choosing k, data stream clustering imposes several challenges to be addressed, such as addressing non-stationary, unbounded data that arrive in an online fashion. In this paper, we propose a Fast Evolutionary Algorithm for Clustering data streams (FEAC-Stream) that allows estimating k automatically from data in an online fashion. FEAC-Stream uses the Page-Hinkley Test to detect eventual degradation in the quality of the induced clusters, thereby triggering an evolutionary algorithm that re-estimates k accordingly. FEAC-Stream relies on the assumption that clusters of (partially unknown) data can provide useful information about the dynamics of the data stream. We illustrate the potential of FEAC-Stream in a set of experiments using both synthetic and real-world data streams, comparing it to four related algorithms, namely: CluStream-OMRk, CluStream-BkM, StreamKM++-OMRk and StreamKM++-BkM. The obtained results show that FEAC-Stream provides good data partitions and that it can detect, and accordingly react to, data changes.2018-01-03T10:37:42Z2017-01-01T00:00:00Z2017info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://repositorio.inesctec.pt/handle/123456789/5342http://dx.doi.org/10.1016/j.eswa.2016.09.020engSilva,JDHruschka,ERJoão Gamainfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-05-15T10:20:31Zoai:repositorio.inesctec.pt:123456789/5342Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T17:53:16.168945Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv An evolutionary algorithm for clustering data streams with a variable number of clusters
title An evolutionary algorithm for clustering data streams with a variable number of clusters
spellingShingle An evolutionary algorithm for clustering data streams with a variable number of clusters
Silva,JD
title_short An evolutionary algorithm for clustering data streams with a variable number of clusters
title_full An evolutionary algorithm for clustering data streams with a variable number of clusters
title_fullStr An evolutionary algorithm for clustering data streams with a variable number of clusters
title_full_unstemmed An evolutionary algorithm for clustering data streams with a variable number of clusters
title_sort An evolutionary algorithm for clustering data streams with a variable number of clusters
author Silva,JD
author_facet Silva,JD
Hruschka,ER
João Gama
author_role author
author2 Hruschka,ER
João Gama
author2_role author
author
dc.contributor.author.fl_str_mv Silva,JD
Hruschka,ER
João Gama
description Several algorithms for clustering data streams based on k-Means have been proposed in the literature. However, most of them assume that the number of clusters, k, is known a priori by the user and can be kept fixed throughout the data analySis process. Besides the difficulty in choosing k, data stream clustering imposes several challenges to be addressed, such as addressing non-stationary, unbounded data that arrive in an online fashion. In this paper, we propose a Fast Evolutionary Algorithm for Clustering data streams (FEAC-Stream) that allows estimating k automatically from data in an online fashion. FEAC-Stream uses the Page-Hinkley Test to detect eventual degradation in the quality of the induced clusters, thereby triggering an evolutionary algorithm that re-estimates k accordingly. FEAC-Stream relies on the assumption that clusters of (partially unknown) data can provide useful information about the dynamics of the data stream. We illustrate the potential of FEAC-Stream in a set of experiments using both synthetic and real-world data streams, comparing it to four related algorithms, namely: CluStream-OMRk, CluStream-BkM, StreamKM++-OMRk and StreamKM++-BkM. The obtained results show that FEAC-Stream provides good data partitions and that it can detect, and accordingly react to, data changes.
publishDate 2017
dc.date.none.fl_str_mv 2017-01-01T00:00:00Z
2017
2018-01-03T10:37:42Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://repositorio.inesctec.pt/handle/123456789/5342
http://dx.doi.org/10.1016/j.eswa.2016.09.020
url http://repositorio.inesctec.pt/handle/123456789/5342
http://dx.doi.org/10.1016/j.eswa.2016.09.020
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799131606917578752