Comparing clustering solutions: the use of adjusted paired indices
Autor(a) principal: | |
---|---|
Data de Publicação: | 2015 |
Outros Autores: | |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10400.21/6191 |
Resumo: | In the present paper we compare clustering solutions using indices of paired agreement. We propose a new method - IADJUST - to correct indices of paired agreement, excluding agreement by chance. This new method overcomes previous limitations known in the literature as it permits the correction of any index. We illustrate its use in external clustering validation, to measure the accordance between clusters and an a priori known structure. The adjusted indices are intended to provide a realistic measure of clustering performance that excludes agreement by chance with ground truth. We use simulated data sets, under a range of scenarios - considering diverse numbers of clusters, clusters overlaps and balances - to discuss the pertinence and the precision of our proposal. Precision is established based on comparisons with the analytical approach for correction specific indices that can be corrected in this way are used for this purpose. The pertinence of the proposed correction is discussed when making a detailed comparison between the performance of two classical clustering approaches, namely Expectation-Maximization (EM) and K-Means (KM) algorithms. Eight indices of paired agreement are studied and new corrected indices are obtained. |
id |
RCAP_eebe293ae0e9921aec1cae186391a678 |
---|---|
oai_identifier_str |
oai:repositorio.ipl.pt:10400.21/6191 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Comparing clustering solutions: the use of adjusted paired indicesAdjusted indicesIndices of paired agreementClustering evaluationExternal evaluationIn the present paper we compare clustering solutions using indices of paired agreement. We propose a new method - IADJUST - to correct indices of paired agreement, excluding agreement by chance. This new method overcomes previous limitations known in the literature as it permits the correction of any index. We illustrate its use in external clustering validation, to measure the accordance between clusters and an a priori known structure. The adjusted indices are intended to provide a realistic measure of clustering performance that excludes agreement by chance with ground truth. We use simulated data sets, under a range of scenarios - considering diverse numbers of clusters, clusters overlaps and balances - to discuss the pertinence and the precision of our proposal. Precision is established based on comparisons with the analytical approach for correction specific indices that can be corrected in this way are used for this purpose. The pertinence of the proposed correction is discussed when making a detailed comparison between the performance of two classical clustering approaches, namely Expectation-Maximization (EM) and K-Means (KM) algorithms. Eight indices of paired agreement are studied and new corrected indices are obtained.Ios PressRCIPLAmorim, Maria José de Pina da CruzCardoso, Margarida G. M. S.2016-05-20T10:43:19Z20152015-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10400.21/6191engAMORIM, Maria josé; CARDOSO, Margarida G. M. S. - Comparing clustering solutions: the use of adjusted paired indices. Intelligent Data Analysis. ISSN 1088-467X. Vol. 19, N.º 6 (2015), pp. 1275-12961088-467X10.3233/IDA-150782metadata only accessinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-08-03T09:50:41Zoai:repositorio.ipl.pt:10400.21/6191Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T20:15:22.380894Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Comparing clustering solutions: the use of adjusted paired indices |
title |
Comparing clustering solutions: the use of adjusted paired indices |
spellingShingle |
Comparing clustering solutions: the use of adjusted paired indices Amorim, Maria José de Pina da Cruz Adjusted indices Indices of paired agreement Clustering evaluation External evaluation |
title_short |
Comparing clustering solutions: the use of adjusted paired indices |
title_full |
Comparing clustering solutions: the use of adjusted paired indices |
title_fullStr |
Comparing clustering solutions: the use of adjusted paired indices |
title_full_unstemmed |
Comparing clustering solutions: the use of adjusted paired indices |
title_sort |
Comparing clustering solutions: the use of adjusted paired indices |
author |
Amorim, Maria José de Pina da Cruz |
author_facet |
Amorim, Maria José de Pina da Cruz Cardoso, Margarida G. M. S. |
author_role |
author |
author2 |
Cardoso, Margarida G. M. S. |
author2_role |
author |
dc.contributor.none.fl_str_mv |
RCIPL |
dc.contributor.author.fl_str_mv |
Amorim, Maria José de Pina da Cruz Cardoso, Margarida G. M. S. |
dc.subject.por.fl_str_mv |
Adjusted indices Indices of paired agreement Clustering evaluation External evaluation |
topic |
Adjusted indices Indices of paired agreement Clustering evaluation External evaluation |
description |
In the present paper we compare clustering solutions using indices of paired agreement. We propose a new method - IADJUST - to correct indices of paired agreement, excluding agreement by chance. This new method overcomes previous limitations known in the literature as it permits the correction of any index. We illustrate its use in external clustering validation, to measure the accordance between clusters and an a priori known structure. The adjusted indices are intended to provide a realistic measure of clustering performance that excludes agreement by chance with ground truth. We use simulated data sets, under a range of scenarios - considering diverse numbers of clusters, clusters overlaps and balances - to discuss the pertinence and the precision of our proposal. Precision is established based on comparisons with the analytical approach for correction specific indices that can be corrected in this way are used for this purpose. The pertinence of the proposed correction is discussed when making a detailed comparison between the performance of two classical clustering approaches, namely Expectation-Maximization (EM) and K-Means (KM) algorithms. Eight indices of paired agreement are studied and new corrected indices are obtained. |
publishDate |
2015 |
dc.date.none.fl_str_mv |
2015 2015-01-01T00:00:00Z 2016-05-20T10:43:19Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10400.21/6191 |
url |
http://hdl.handle.net/10400.21/6191 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
AMORIM, Maria josé; CARDOSO, Margarida G. M. S. - Comparing clustering solutions: the use of adjusted paired indices. Intelligent Data Analysis. ISSN 1088-467X. Vol. 19, N.º 6 (2015), pp. 1275-1296 1088-467X 10.3233/IDA-150782 |
dc.rights.driver.fl_str_mv |
metadata only access info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
metadata only access |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Ios Press |
publisher.none.fl_str_mv |
Ios Press |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799133412085202944 |