Comparing clustering solutions: the use of adjusted paired indices

Detalhes bibliográficos
Autor(a) principal: Amorim, Maria José de Pina da Cruz
Data de Publicação: 2015
Outros Autores: Cardoso, Margarida G. M. S.
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10400.21/6191
Resumo: In the present paper we compare clustering solutions using indices of paired agreement. We propose a new method - IADJUST - to correct indices of paired agreement, excluding agreement by chance. This new method overcomes previous limitations known in the literature as it permits the correction of any index. We illustrate its use in external clustering validation, to measure the accordance between clusters and an a priori known structure. The adjusted indices are intended to provide a realistic measure of clustering performance that excludes agreement by chance with ground truth. We use simulated data sets, under a range of scenarios - considering diverse numbers of clusters, clusters overlaps and balances - to discuss the pertinence and the precision of our proposal. Precision is established based on comparisons with the analytical approach for correction specific indices that can be corrected in this way are used for this purpose. The pertinence of the proposed correction is discussed when making a detailed comparison between the performance of two classical clustering approaches, namely Expectation-Maximization (EM) and K-Means (KM) algorithms. Eight indices of paired agreement are studied and new corrected indices are obtained.
id RCAP_eebe293ae0e9921aec1cae186391a678
oai_identifier_str oai:repositorio.ipl.pt:10400.21/6191
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str
spelling Comparing clustering solutions: the use of adjusted paired indicesAdjusted indicesIndices of paired agreementClustering evaluationExternal evaluationIn the present paper we compare clustering solutions using indices of paired agreement. We propose a new method - IADJUST - to correct indices of paired agreement, excluding agreement by chance. This new method overcomes previous limitations known in the literature as it permits the correction of any index. We illustrate its use in external clustering validation, to measure the accordance between clusters and an a priori known structure. The adjusted indices are intended to provide a realistic measure of clustering performance that excludes agreement by chance with ground truth. We use simulated data sets, under a range of scenarios - considering diverse numbers of clusters, clusters overlaps and balances - to discuss the pertinence and the precision of our proposal. Precision is established based on comparisons with the analytical approach for correction specific indices that can be corrected in this way are used for this purpose. The pertinence of the proposed correction is discussed when making a detailed comparison between the performance of two classical clustering approaches, namely Expectation-Maximization (EM) and K-Means (KM) algorithms. Eight indices of paired agreement are studied and new corrected indices are obtained.Ios PressRCIPLAmorim, Maria José de Pina da CruzCardoso, Margarida G. M. S.2016-05-20T10:43:19Z20152015-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10400.21/6191engAMORIM, Maria josé; CARDOSO, Margarida G. M. S. - Comparing clustering solutions: the use of adjusted paired indices. Intelligent Data Analysis. ISSN 1088-467X. Vol. 19, N.º 6 (2015), pp. 1275-12961088-467X10.3233/IDA-150782metadata only accessinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-08-03T09:50:41ZPortal AgregadorONG
dc.title.none.fl_str_mv Comparing clustering solutions: the use of adjusted paired indices
title Comparing clustering solutions: the use of adjusted paired indices
spellingShingle Comparing clustering solutions: the use of adjusted paired indices
Amorim, Maria José de Pina da Cruz
Adjusted indices
Indices of paired agreement
Clustering evaluation
External evaluation
title_short Comparing clustering solutions: the use of adjusted paired indices
title_full Comparing clustering solutions: the use of adjusted paired indices
title_fullStr Comparing clustering solutions: the use of adjusted paired indices
title_full_unstemmed Comparing clustering solutions: the use of adjusted paired indices
title_sort Comparing clustering solutions: the use of adjusted paired indices
author Amorim, Maria José de Pina da Cruz
author_facet Amorim, Maria José de Pina da Cruz
Cardoso, Margarida G. M. S.
author_role author
author2 Cardoso, Margarida G. M. S.
author2_role author
dc.contributor.none.fl_str_mv RCIPL
dc.contributor.author.fl_str_mv Amorim, Maria José de Pina da Cruz
Cardoso, Margarida G. M. S.
dc.subject.por.fl_str_mv Adjusted indices
Indices of paired agreement
Clustering evaluation
External evaluation
topic Adjusted indices
Indices of paired agreement
Clustering evaluation
External evaluation
description In the present paper we compare clustering solutions using indices of paired agreement. We propose a new method - IADJUST - to correct indices of paired agreement, excluding agreement by chance. This new method overcomes previous limitations known in the literature as it permits the correction of any index. We illustrate its use in external clustering validation, to measure the accordance between clusters and an a priori known structure. The adjusted indices are intended to provide a realistic measure of clustering performance that excludes agreement by chance with ground truth. We use simulated data sets, under a range of scenarios - considering diverse numbers of clusters, clusters overlaps and balances - to discuss the pertinence and the precision of our proposal. Precision is established based on comparisons with the analytical approach for correction specific indices that can be corrected in this way are used for this purpose. The pertinence of the proposed correction is discussed when making a detailed comparison between the performance of two classical clustering approaches, namely Expectation-Maximization (EM) and K-Means (KM) algorithms. Eight indices of paired agreement are studied and new corrected indices are obtained.
publishDate 2015
dc.date.none.fl_str_mv 2015
2015-01-01T00:00:00Z
2016-05-20T10:43:19Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10400.21/6191
url http://hdl.handle.net/10400.21/6191
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv AMORIM, Maria josé; CARDOSO, Margarida G. M. S. - Comparing clustering solutions: the use of adjusted paired indices. Intelligent Data Analysis. ISSN 1088-467X. Vol. 19, N.º 6 (2015), pp. 1275-1296
1088-467X
10.3233/IDA-150782
dc.rights.driver.fl_str_mv metadata only access
info:eu-repo/semantics/openAccess
rights_invalid_str_mv metadata only access
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Ios Press
publisher.none.fl_str_mv Ios Press
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv
repository.mail.fl_str_mv
_version_ 1777304520086257664