Clustering stability and ground truth: numerical experiments

Detalhes bibliográficos
Autor(a) principal: Amorim, M. J.
Data de Publicação: 2015
Outros Autores: Cardoso, M. G. M. S.
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10071/11351
Resumo: Stability has been considered an important property for evaluating clustering solutions. Nevertheless, there are no conclusive studies on the relationship between this property and the capacity to recover clusters inherent to data (“ground truth”). This study focuses on this relationship, resorting to experiments on synthetic data generated under diverse scenarios (controlling relevant factors) and experiments on real data sets. Stability is evaluated using a weighted cross-validation procedure. Indices of agreement (corrected for agreement by chance) are used both to assess stability and external validity. The results obtained reveal a new perspective so far not mentioned in the literature. Despite the clear relationship between stability and external validity when a broad range of scenarios is considered, the within-scenarios conclusions deserve our special attention: faced with a specific clustering problem (as we do in practice), there is no significant relationship between clustering stability and the ability to recover data clusters
id RCAP_e09b504c18e0ef0b0f2aaef67592701d
oai_identifier_str oai:repositorio.iscte-iul.pt:10071/11351
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str
spelling Clustering stability and ground truth: numerical experimentsClusteringExternal validationStabilityStability has been considered an important property for evaluating clustering solutions. Nevertheless, there are no conclusive studies on the relationship between this property and the capacity to recover clusters inherent to data (“ground truth”). This study focuses on this relationship, resorting to experiments on synthetic data generated under diverse scenarios (controlling relevant factors) and experiments on real data sets. Stability is evaluated using a weighted cross-validation procedure. Indices of agreement (corrected for agreement by chance) are used both to assess stability and external validity. The results obtained reveal a new perspective so far not mentioned in the literature. Despite the clear relationship between stability and external validity when a broad range of scenarios is considered, the within-scenarios conclusions deserve our special attention: faced with a specific clustering problem (as we do in practice), there is no significant relationship between clustering stability and the ability to recover data clustersRG Education Society2016-05-20T13:16:02Z2015-01-01T00:00:00Z20152019-05-16T11:32:08Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10071/11351eng2231-2021Amorim, M. J.Cardoso, M. G. M. S.info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-07-25T17:37:53ZPortal AgregadorONG
dc.title.none.fl_str_mv Clustering stability and ground truth: numerical experiments
title Clustering stability and ground truth: numerical experiments
spellingShingle Clustering stability and ground truth: numerical experiments
Amorim, M. J.
Clustering
External validation
Stability
title_short Clustering stability and ground truth: numerical experiments
title_full Clustering stability and ground truth: numerical experiments
title_fullStr Clustering stability and ground truth: numerical experiments
title_full_unstemmed Clustering stability and ground truth: numerical experiments
title_sort Clustering stability and ground truth: numerical experiments
author Amorim, M. J.
author_facet Amorim, M. J.
Cardoso, M. G. M. S.
author_role author
author2 Cardoso, M. G. M. S.
author2_role author
dc.contributor.author.fl_str_mv Amorim, M. J.
Cardoso, M. G. M. S.
dc.subject.por.fl_str_mv Clustering
External validation
Stability
topic Clustering
External validation
Stability
description Stability has been considered an important property for evaluating clustering solutions. Nevertheless, there are no conclusive studies on the relationship between this property and the capacity to recover clusters inherent to data (“ground truth”). This study focuses on this relationship, resorting to experiments on synthetic data generated under diverse scenarios (controlling relevant factors) and experiments on real data sets. Stability is evaluated using a weighted cross-validation procedure. Indices of agreement (corrected for agreement by chance) are used both to assess stability and external validity. The results obtained reveal a new perspective so far not mentioned in the literature. Despite the clear relationship between stability and external validity when a broad range of scenarios is considered, the within-scenarios conclusions deserve our special attention: faced with a specific clustering problem (as we do in practice), there is no significant relationship between clustering stability and the ability to recover data clusters
publishDate 2015
dc.date.none.fl_str_mv 2015-01-01T00:00:00Z
2015
2016-05-20T13:16:02Z
2019-05-16T11:32:08Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10071/11351
url http://hdl.handle.net/10071/11351
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 2231-2021
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv RG Education Society
publisher.none.fl_str_mv RG Education Society
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv
repository.mail.fl_str_mv
_version_ 1777304004493049856