A model for clustering data from heterogeneous dissimilarities
Autor(a) principal: | |
---|---|
Data de Publicação: | 2016 |
Outros Autores: | , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Institucional da UFRN |
Texto Completo: | https://repositorio.ufrn.br/handle/123456789/30633 |
Resumo: | Clustering algorithms partition a set of n objects into p groups (called clusters), such that objects assigned to the same groups are homogeneous according to some criteria. To derive these clusters, the data input required is often a single n × n dissimilarity matrix. Yet for many applications, more than one instance of the dissimilarity matrix is available and so to conform to model requirements, it is common practice to aggregate (e.g., sum up, average) the matrices. This aggregation practice results in clustering solutions that mask the true nature of the original data. In this paper we introduce a clustering model which, to handle the heterogeneity, uses all available dissimilarity matrices and identifies for groups of individuals clustering objects in a similar way. The model is a nonconvex problem and difficult to solve exactly, and we thus introduce a Variable Neighborhood Search heuristic to provide solutions efficiently. Computational experiments and an empirical application to perception of chocolate candy show that the heuristic algorithm is efficient and that the proposed model is suited for recovering heterogeneous data. Implications for clustering researchers are discussed |
id |
UFRN_bbcc22f06736420702c0c201474c90fc |
---|---|
oai_identifier_str |
oai:https://repositorio.ufrn.br:123456789/30633 |
network_acronym_str |
UFRN |
network_name_str |
Repositório Institucional da UFRN |
repository_id_str |
|
spelling |
Santi, ÉvertonAloise, DanielBlanchard, Simon J.2020-11-23T15:27:39Z2020-11-23T15:27:39Z2016-09-16SANTI, Éverton; ALOISE, Daniel; BLANCHARD, Simon J.. A model for clustering data from heterogeneous dissimilarities. European Journal of Operational Research, [S.L.], v. 253, n. 3, p. 659-672, set. 2016. Disponível em: https://www.sciencedirect.com/science/article/abs/pii/S0377221716301618?via%3Dihub. Acesso em: 08 set. 2020. http://dx.doi.org/10.1016/j.ejor.2016.03.033.0377-2217https://repositorio.ufrn.br/handle/123456789/3063310.1016/j.ejor.2016.03.033ElsevierHeterogeneityHeuristicsData miningClusteringOptimizationA model for clustering data from heterogeneous dissimilaritiesinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleClustering algorithms partition a set of n objects into p groups (called clusters), such that objects assigned to the same groups are homogeneous according to some criteria. To derive these clusters, the data input required is often a single n × n dissimilarity matrix. Yet for many applications, more than one instance of the dissimilarity matrix is available and so to conform to model requirements, it is common practice to aggregate (e.g., sum up, average) the matrices. This aggregation practice results in clustering solutions that mask the true nature of the original data. In this paper we introduce a clustering model which, to handle the heterogeneity, uses all available dissimilarity matrices and identifies for groups of individuals clustering objects in a similar way. The model is a nonconvex problem and difficult to solve exactly, and we thus introduce a Variable Neighborhood Search heuristic to provide solutions efficiently. Computational experiments and an empirical application to perception of chocolate candy show that the heuristic algorithm is efficient and that the proposed model is suited for recovering heterogeneous data. Implications for clustering researchers are discussedengreponame:Repositório Institucional da UFRNinstname:Universidade Federal do Rio Grande do Norte (UFRN)instacron:UFRNinfo:eu-repo/semantics/openAccessCC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8914https://repositorio.ufrn.br/bitstream/123456789/30633/2/license_rdf4d2950bda3d176f570a9f8b328dfbbefMD52LICENSElicense.txtlicense.txttext/plain; charset=utf-81484https://repositorio.ufrn.br/bitstream/123456789/30633/3/license.txte9597aa2854d128fd968be5edc8a28d9MD53TEXTModelForClusteringData_2016.pdf.txtModelForClusteringData_2016.pdf.txtExtracted texttext/plain88886https://repositorio.ufrn.br/bitstream/123456789/30633/4/ModelForClusteringData_2016.pdf.txt22ce5fa407e1c161abc961538ed3c77eMD54THUMBNAILModelForClusteringData_2016.pdf.jpgModelForClusteringData_2016.pdf.jpgGenerated Thumbnailimage/jpeg1651https://repositorio.ufrn.br/bitstream/123456789/30633/5/ModelForClusteringData_2016.pdf.jpgcdfe60ab42036b294a44f9b9c371ca5bMD55123456789/306332023-02-03 19:07:33.403oai:https://repositorio.ufrn.br:123456789/30633Tk9OLUVYQ0xVU0lWRSBESVNUUklCVVRJT04gTElDRU5TRQoKCkJ5IHNpZ25pbmcgYW5kIGRlbGl2ZXJpbmcgdGhpcyBsaWNlbnNlLCBNci4gKGF1dGhvciBvciBjb3B5cmlnaHQgaG9sZGVyKToKCgphKSBHcmFudHMgdGhlIFVuaXZlcnNpZGFkZSBGZWRlcmFsIFJpbyBHcmFuZGUgZG8gTm9ydGUgdGhlIG5vbi1leGNsdXNpdmUgcmlnaHQgb2YKcmVwcm9kdWNlLCBjb252ZXJ0IChhcyBkZWZpbmVkIGJlbG93KSwgY29tbXVuaWNhdGUgYW5kIC8gb3IKZGlzdHJpYnV0ZSB0aGUgZGVsaXZlcmVkIGRvY3VtZW50IChpbmNsdWRpbmcgYWJzdHJhY3QgLyBhYnN0cmFjdCkgaW4KZGlnaXRhbCBvciBwcmludGVkIGZvcm1hdCBhbmQgaW4gYW55IG1lZGl1bS4KCmIpIERlY2xhcmVzIHRoYXQgdGhlIGRvY3VtZW50IHN1Ym1pdHRlZCBpcyBpdHMgb3JpZ2luYWwgd29yaywgYW5kIHRoYXQKeW91IGhhdmUgdGhlIHJpZ2h0IHRvIGdyYW50IHRoZSByaWdodHMgY29udGFpbmVkIGluIHRoaXMgbGljZW5zZS4gRGVjbGFyZXMKdGhhdCB0aGUgZGVsaXZlcnkgb2YgdGhlIGRvY3VtZW50IGRvZXMgbm90IGluZnJpbmdlLCBhcyBmYXIgYXMgaXQgaXMKdGhlIHJpZ2h0cyBvZiBhbnkgb3RoZXIgcGVyc29uIG9yIGVudGl0eS4KCmMpIElmIHRoZSBkb2N1bWVudCBkZWxpdmVyZWQgY29udGFpbnMgbWF0ZXJpYWwgd2hpY2ggZG9lcyBub3QKcmlnaHRzLCBkZWNsYXJlcyB0aGF0IGl0IGhhcyBvYnRhaW5lZCBhdXRob3JpemF0aW9uIGZyb20gdGhlIGhvbGRlciBvZiB0aGUKY29weXJpZ2h0IHRvIGdyYW50IHRoZSBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkbyBSaW8gR3JhbmRlIGRvIE5vcnRlIHRoZSByaWdodHMgcmVxdWlyZWQgYnkgdGhpcyBsaWNlbnNlLCBhbmQgdGhhdCB0aGlzIG1hdGVyaWFsIHdob3NlIHJpZ2h0cyBhcmUgb2YKdGhpcmQgcGFydGllcyBpcyBjbGVhcmx5IGlkZW50aWZpZWQgYW5kIHJlY29nbml6ZWQgaW4gdGhlIHRleHQgb3IKY29udGVudCBvZiB0aGUgZG9jdW1lbnQgZGVsaXZlcmVkLgoKSWYgdGhlIGRvY3VtZW50IHN1Ym1pdHRlZCBpcyBiYXNlZCBvbiBmdW5kZWQgb3Igc3VwcG9ydGVkIHdvcmsKYnkgYW5vdGhlciBpbnN0aXR1dGlvbiBvdGhlciB0aGFuIHRoZSBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkbyBSaW8gR3JhbmRlIGRvIE5vcnRlLCBkZWNsYXJlcyB0aGF0IGl0IGhhcyBmdWxmaWxsZWQgYW55IG9ibGlnYXRpb25zIHJlcXVpcmVkIGJ5IHRoZSByZXNwZWN0aXZlIGFncmVlbWVudCBvciBhZ3JlZW1lbnQuCgpUaGUgVW5pdmVyc2lkYWRlIEZlZGVyYWwgZG8gUmlvIEdyYW5kZSBkbyBOb3J0ZSB3aWxsIGNsZWFybHkgaWRlbnRpZnkgaXRzIG5hbWUgKHMpIGFzIHRoZSBhdXRob3IgKHMpIG9yIGhvbGRlciAocykgb2YgdGhlIGRvY3VtZW50J3MgcmlnaHRzCmRlbGl2ZXJlZCwgYW5kIHdpbGwgbm90IG1ha2UgYW55IGNoYW5nZXMsIG90aGVyIHRoYW4gdGhvc2UgcGVybWl0dGVkIGJ5CnRoaXMgbGljZW5zZQo=Repositório de PublicaçõesPUBhttp://repositorio.ufrn.br/oai/opendoar:2023-02-03T22:07:33Repositório Institucional da UFRN - Universidade Federal do Rio Grande do Norte (UFRN)false |
dc.title.pt_BR.fl_str_mv |
A model for clustering data from heterogeneous dissimilarities |
title |
A model for clustering data from heterogeneous dissimilarities |
spellingShingle |
A model for clustering data from heterogeneous dissimilarities Santi, Éverton Heterogeneity Heuristics Data mining Clustering Optimization |
title_short |
A model for clustering data from heterogeneous dissimilarities |
title_full |
A model for clustering data from heterogeneous dissimilarities |
title_fullStr |
A model for clustering data from heterogeneous dissimilarities |
title_full_unstemmed |
A model for clustering data from heterogeneous dissimilarities |
title_sort |
A model for clustering data from heterogeneous dissimilarities |
author |
Santi, Éverton |
author_facet |
Santi, Éverton Aloise, Daniel Blanchard, Simon J. |
author_role |
author |
author2 |
Aloise, Daniel Blanchard, Simon J. |
author2_role |
author author |
dc.contributor.author.fl_str_mv |
Santi, Éverton Aloise, Daniel Blanchard, Simon J. |
dc.subject.por.fl_str_mv |
Heterogeneity Heuristics Data mining Clustering Optimization |
topic |
Heterogeneity Heuristics Data mining Clustering Optimization |
description |
Clustering algorithms partition a set of n objects into p groups (called clusters), such that objects assigned to the same groups are homogeneous according to some criteria. To derive these clusters, the data input required is often a single n × n dissimilarity matrix. Yet for many applications, more than one instance of the dissimilarity matrix is available and so to conform to model requirements, it is common practice to aggregate (e.g., sum up, average) the matrices. This aggregation practice results in clustering solutions that mask the true nature of the original data. In this paper we introduce a clustering model which, to handle the heterogeneity, uses all available dissimilarity matrices and identifies for groups of individuals clustering objects in a similar way. The model is a nonconvex problem and difficult to solve exactly, and we thus introduce a Variable Neighborhood Search heuristic to provide solutions efficiently. Computational experiments and an empirical application to perception of chocolate candy show that the heuristic algorithm is efficient and that the proposed model is suited for recovering heterogeneous data. Implications for clustering researchers are discussed |
publishDate |
2016 |
dc.date.issued.fl_str_mv |
2016-09-16 |
dc.date.accessioned.fl_str_mv |
2020-11-23T15:27:39Z |
dc.date.available.fl_str_mv |
2020-11-23T15:27:39Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.citation.fl_str_mv |
SANTI, Éverton; ALOISE, Daniel; BLANCHARD, Simon J.. A model for clustering data from heterogeneous dissimilarities. European Journal of Operational Research, [S.L.], v. 253, n. 3, p. 659-672, set. 2016. Disponível em: https://www.sciencedirect.com/science/article/abs/pii/S0377221716301618?via%3Dihub. Acesso em: 08 set. 2020. http://dx.doi.org/10.1016/j.ejor.2016.03.033. |
dc.identifier.uri.fl_str_mv |
https://repositorio.ufrn.br/handle/123456789/30633 |
dc.identifier.issn.none.fl_str_mv |
0377-2217 |
dc.identifier.doi.none.fl_str_mv |
10.1016/j.ejor.2016.03.033 |
identifier_str_mv |
SANTI, Éverton; ALOISE, Daniel; BLANCHARD, Simon J.. A model for clustering data from heterogeneous dissimilarities. European Journal of Operational Research, [S.L.], v. 253, n. 3, p. 659-672, set. 2016. Disponível em: https://www.sciencedirect.com/science/article/abs/pii/S0377221716301618?via%3Dihub. Acesso em: 08 set. 2020. http://dx.doi.org/10.1016/j.ejor.2016.03.033. 0377-2217 10.1016/j.ejor.2016.03.033 |
url |
https://repositorio.ufrn.br/handle/123456789/30633 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.publisher.none.fl_str_mv |
Elsevier |
publisher.none.fl_str_mv |
Elsevier |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFRN instname:Universidade Federal do Rio Grande do Norte (UFRN) instacron:UFRN |
instname_str |
Universidade Federal do Rio Grande do Norte (UFRN) |
instacron_str |
UFRN |
institution |
UFRN |
reponame_str |
Repositório Institucional da UFRN |
collection |
Repositório Institucional da UFRN |
bitstream.url.fl_str_mv |
https://repositorio.ufrn.br/bitstream/123456789/30633/2/license_rdf https://repositorio.ufrn.br/bitstream/123456789/30633/3/license.txt https://repositorio.ufrn.br/bitstream/123456789/30633/4/ModelForClusteringData_2016.pdf.txt https://repositorio.ufrn.br/bitstream/123456789/30633/5/ModelForClusteringData_2016.pdf.jpg |
bitstream.checksum.fl_str_mv |
4d2950bda3d176f570a9f8b328dfbbef e9597aa2854d128fd968be5edc8a28d9 22ce5fa407e1c161abc961538ed3c77e cdfe60ab42036b294a44f9b9c371ca5b |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 MD5 |
repository.name.fl_str_mv |
Repositório Institucional da UFRN - Universidade Federal do Rio Grande do Norte (UFRN) |
repository.mail.fl_str_mv |
|
_version_ |
1802117784933498880 |