Variable weighted fuzzy clustering algorithm for qualitative data

Detalhes bibliográficos
Autor(a) principal: TEOTONIO, Gabriel Harrison Fidelis
Data de Publicação: 2023
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Institucional da UFPE
dARK ID: ark:/64986/0013000010gz0
Texto Completo: https://repositorio.ufpe.br/handle/123456789/53504
Resumo: This work focuses on the clustering methods within unsupervised learning, a challenging sub-division of Machine Learning where there is no response variable available. Clustering is a technique for finding groups in a dataset, where the observations in each group are similar to each other and different from those in other groups. The K-Means method, recognized as the most well-known and widely used clustering technique, efficiently handles quantitative variables, like many other existing clustering methods. However, the K-Means algorithm cannot be used with qualitative variables, such as gender or education level. To overcome this limitation, the K-Modes method was proposed, which uses modes instead of means to represent the clusters. The existing partitional clustering algorithms without variable weighting have a limitation in that they assign equal importance to all variables. It can be problematic when clustering high-dimensional, sparse data where the characterization of cluster partitions can be explained by a particular subset of variables. To address this issue, subspace clustering techniques and adaptive distances have been proposed, with the latter being derived from constraints based on the sum and product of the weights relative to the importance of the variables. This work proposes a new fuzzy clustering algorithm for qualitative data based on adaptive distances, which demonstrates improved performance compared to conventional methods. The local adaptive distances, which assign different weights to each variable across the clusters, perform better for datasets with high levels of dispersion and overlap of classes. The results extend the capabilities of existing clustering algorithms based on adaptive distances.
id UFPE_62613c1907c67db2d5d20f37ecb188c1
oai_identifier_str oai:repositorio.ufpe.br:123456789/53504
network_acronym_str UFPE
network_name_str Repositório Institucional da UFPE
repository_id_str 2221
spelling TEOTONIO, Gabriel Harrison Fidelishttp://lattes.cnpq.br/3723910313293363http://lattes.cnpq.br/9289080285504453http://lattes.cnpq.br/7674916684282039SOUZA, Renata Maria Cardoso Rodrigues deAMARAL, Getúlio José Amorim do2023-11-08T17:37:34Z2023-11-08T17:37:34Z2023-05-25TEOTONIO, Gabriel Harrison Fidelis. Variable weighted fuzzy clustering algorithm for qualitative data. 2023. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de Pernambuco, Recife, 2023.https://repositorio.ufpe.br/handle/123456789/53504ark:/64986/0013000010gz0This work focuses on the clustering methods within unsupervised learning, a challenging sub-division of Machine Learning where there is no response variable available. Clustering is a technique for finding groups in a dataset, where the observations in each group are similar to each other and different from those in other groups. The K-Means method, recognized as the most well-known and widely used clustering technique, efficiently handles quantitative variables, like many other existing clustering methods. However, the K-Means algorithm cannot be used with qualitative variables, such as gender or education level. To overcome this limitation, the K-Modes method was proposed, which uses modes instead of means to represent the clusters. The existing partitional clustering algorithms without variable weighting have a limitation in that they assign equal importance to all variables. It can be problematic when clustering high-dimensional, sparse data where the characterization of cluster partitions can be explained by a particular subset of variables. To address this issue, subspace clustering techniques and adaptive distances have been proposed, with the latter being derived from constraints based on the sum and product of the weights relative to the importance of the variables. This work proposes a new fuzzy clustering algorithm for qualitative data based on adaptive distances, which demonstrates improved performance compared to conventional methods. The local adaptive distances, which assign different weights to each variable across the clusters, perform better for datasets with high levels of dispersion and overlap of classes. The results extend the capabilities of existing clustering algorithms based on adaptive distances.CNPqEste trabalho se concentra nos métodos de agrupamento dentro do aprendizado não supervisionado, uma subdivisão desafiadora da Aprendizagem de Máquina onde não há variável resposta disponível. O agrupamento é uma técnica para encontrar grupos em um conjunto de dados, onde as observações em cada grupo são semelhantes umas às outras e diferentes das observações em outros grupos. O método K-Means, reconhecido como a técnica de agrupamento mais conhecida e amplamente utilizada, lida de forma eficiente com variáveis quantitativas, assim como muitos outros métodos de agrupamento existentes. No entanto, o algoritmo K-Means não pode ser usado com variáveis qualitativas, como gênero ou nível de educação. Para superar esta limitação, foi proposto o método K-Modes, que usa modas em vez de médias para representar os grupos. Os algoritmos de agrupamento particional existentes sem ponderação variável têm a limitação de atribuir importância igual a todas as variáveis. Isso pode ser problemático ao agrupar dados de alta dimensão e esparsos, onde a caracterização das partições do agrupamento pode ser explicada por um subconjunto particular de variáveis. Para abordar este problema, foram propostas técnicas de agrupamento de subespaço e distâncias adaptativas, sendo estas últimas derivadas a partir de restrições baseadas na soma e no produto dos pesos relativos à importância das variáveis. Este trabalho propõe um novo algoritmo de agrupamento difuso para dados qualitativos baseado em distâncias adaptativas, o qual demonstra desempenho melhorado em comparação aos métodos convencionais. As distâncias adaptativas locais, que atribuem pesos diferentes para cada variável em relação aos grupos, apresentam melhor desempenho para conjuntos de dados com altos níveis de dispersão e sobreposição de classes. Os resultados ampliam as capacidades dos algoritmos de agrupamento existentes baseados em distâncias adaptativas.engUniversidade Federal de PernambucoPrograma de Pos Graduacao em Ciencia da ComputacaoUFPEBrasilAttribution-NonCommercial-NoDerivs 3.0 Brazilhttp://creativecommons.org/licenses/by-nc-nd/3.0/br/info:eu-repo/semantics/embargoedAccessInteligência computacionalAgrupamentoVariable weighted fuzzy clustering algorithm for qualitative datainfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesismestradoreponame:Repositório Institucional da UFPEinstname:Universidade Federal de Pernambuco (UFPE)instacron:UFPELICENSElicense.txtlicense.txttext/plain; charset=utf-82362https://repositorio.ufpe.br/bitstream/123456789/53504/3/license.txt5e89a1613ddc8510c6576f4b23a78973MD53ORIGINALDISSERTAÇAO Gabriel Harrison Fidelis Teotonio.pdfDISSERTAÇAO Gabriel Harrison Fidelis Teotonio.pdfapplication/pdf850600https://repositorio.ufpe.br/bitstream/123456789/53504/1/DISSERTA%c3%87AO%20Gabriel%20Harrison%20Fidelis%20Teotonio.pdf9db1e0aab784f7835ec207d58bb55c9aMD51CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8811https://repositorio.ufpe.br/bitstream/123456789/53504/2/license_rdfe39d27027a6cc9cb039ad269a5db8e34MD52TEXTDISSERTAÇAO Gabriel Harrison Fidelis Teotonio.pdf.txtDISSERTAÇAO Gabriel Harrison Fidelis Teotonio.pdf.txtExtracted texttext/plain123623https://repositorio.ufpe.br/bitstream/123456789/53504/4/DISSERTA%c3%87AO%20Gabriel%20Harrison%20Fidelis%20Teotonio.pdf.txt7f766fb275a3fd223c214a02792b5abbMD54THUMBNAILDISSERTAÇAO Gabriel Harrison Fidelis Teotonio.pdf.jpgDISSERTAÇAO Gabriel Harrison Fidelis Teotonio.pdf.jpgGenerated Thumbnailimage/jpeg1232https://repositorio.ufpe.br/bitstream/123456789/53504/5/DISSERTA%c3%87AO%20Gabriel%20Harrison%20Fidelis%20Teotonio.pdf.jpg295f459b2ebe7ec8b5843ff8d11df8a4MD55123456789/535042023-11-09 02:24:41.966oai:repositorio.ufpe.br:123456789/53504VGVybW8gZGUgRGVww7NzaXRvIExlZ2FsIGUgQXV0b3JpemHDp8OjbyBwYXJhIFB1YmxpY2l6YcOnw6NvIGRlIERvY3VtZW50b3Mgbm8gUmVwb3NpdMOzcmlvIERpZ2l0YWwgZGEgVUZQRQoKCkRlY2xhcm8gZXN0YXIgY2llbnRlIGRlIHF1ZSBlc3RlIFRlcm1vIGRlIERlcMOzc2l0byBMZWdhbCBlIEF1dG9yaXphw6fDo28gdGVtIG8gb2JqZXRpdm8gZGUgZGl2dWxnYcOnw6NvIGRvcyBkb2N1bWVudG9zIGRlcG9zaXRhZG9zIG5vIFJlcG9zaXTDs3JpbyBEaWdpdGFsIGRhIFVGUEUgZSBkZWNsYXJvIHF1ZToKCkkgLSBvcyBkYWRvcyBwcmVlbmNoaWRvcyBubyBmb3JtdWzDoXJpbyBkZSBkZXDDs3NpdG8gc8OjbyB2ZXJkYWRlaXJvcyBlIGF1dMOqbnRpY29zOwoKSUkgLSAgbyBjb250ZcO6ZG8gZGlzcG9uaWJpbGl6YWRvIMOpIGRlIHJlc3BvbnNhYmlsaWRhZGUgZGUgc3VhIGF1dG9yaWE7CgpJSUkgLSBvIGNvbnRlw7pkbyDDqSBvcmlnaW5hbCwgZSBzZSBvIHRyYWJhbGhvIGUvb3UgcGFsYXZyYXMgZGUgb3V0cmFzIHBlc3NvYXMgZm9yYW0gdXRpbGl6YWRvcywgZXN0YXMgZm9yYW0gZGV2aWRhbWVudGUgcmVjb25oZWNpZGFzOwoKSVYgLSBxdWFuZG8gdHJhdGFyLXNlIGRlIG9icmEgY29sZXRpdmEgKG1haXMgZGUgdW0gYXV0b3IpOiB0b2RvcyBvcyBhdXRvcmVzIGVzdMOjbyBjaWVudGVzIGRvIGRlcMOzc2l0byBlIGRlIGFjb3JkbyBjb20gZXN0ZSB0ZXJtbzsKClYgLSBxdWFuZG8gdHJhdGFyLXNlIGRlIFRyYWJhbGhvIGRlIENvbmNsdXPDo28gZGUgQ3Vyc28sIERpc3NlcnRhw6fDo28gb3UgVGVzZTogbyBhcnF1aXZvIGRlcG9zaXRhZG8gY29ycmVzcG9uZGUgw6AgdmVyc8OjbyBmaW5hbCBkbyB0cmFiYWxobzsKClZJIC0gcXVhbmRvIHRyYXRhci1zZSBkZSBUcmFiYWxobyBkZSBDb25jbHVzw6NvIGRlIEN1cnNvLCBEaXNzZXJ0YcOnw6NvIG91IFRlc2U6IGVzdG91IGNpZW50ZSBkZSBxdWUgYSBhbHRlcmHDp8OjbyBkYSBtb2RhbGlkYWRlIGRlIGFjZXNzbyBhbyBkb2N1bWVudG8gYXDDs3MgbyBkZXDDs3NpdG8gZSBhbnRlcyBkZSBmaW5kYXIgbyBwZXLDrW9kbyBkZSBlbWJhcmdvLCBxdWFuZG8gZm9yIGVzY29saGlkbyBhY2Vzc28gcmVzdHJpdG8sIHNlcsOhIHBlcm1pdGlkYSBtZWRpYW50ZSBzb2xpY2l0YcOnw6NvIGRvIChhKSBhdXRvciAoYSkgYW8gU2lzdGVtYSBJbnRlZ3JhZG8gZGUgQmlibGlvdGVjYXMgZGEgVUZQRSAoU0lCL1VGUEUpLgoKIApQYXJhIHRyYWJhbGhvcyBlbSBBY2Vzc28gQWJlcnRvOgoKTmEgcXVhbGlkYWRlIGRlIHRpdHVsYXIgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIGRlIGF1dG9yIHF1ZSByZWNhZW0gc29icmUgZXN0ZSBkb2N1bWVudG8sIGZ1bmRhbWVudGFkbyBuYSBMZWkgZGUgRGlyZWl0byBBdXRvcmFsIG5vIDkuNjEwLCBkZSAxOSBkZSBmZXZlcmVpcm8gZGUgMTk5OCwgYXJ0LiAyOSwgaW5jaXNvIElJSSwgYXV0b3Jpem8gYSBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkZSBQZXJuYW1idWNvIGEgZGlzcG9uaWJpbGl6YXIgZ3JhdHVpdGFtZW50ZSwgc2VtIHJlc3NhcmNpbWVudG8gZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCBwYXJhIGZpbnMgZGUgbGVpdHVyYSwgaW1wcmVzc8OjbyBlL291IGRvd25sb2FkIChhcXVpc2nDp8OjbykgYXRyYXbDqXMgZG8gc2l0ZSBkbyBSZXBvc2l0w7NyaW8gRGlnaXRhbCBkYSBVRlBFIG5vIGVuZGVyZcOnbyBodHRwOi8vd3d3LnJlcG9zaXRvcmlvLnVmcGUuYnIsIGEgcGFydGlyIGRhIGRhdGEgZGUgZGVww7NzaXRvLgoKIApQYXJhIHRyYWJhbGhvcyBlbSBBY2Vzc28gUmVzdHJpdG86CgpOYSBxdWFsaWRhZGUgZGUgdGl0dWxhciBkb3MgZGlyZWl0b3MgYXV0b3JhaXMgZGUgYXV0b3IgcXVlIHJlY2FlbSBzb2JyZSBlc3RlIGRvY3VtZW50bywgZnVuZGFtZW50YWRvIG5hIExlaSBkZSBEaXJlaXRvIEF1dG9yYWwgbm8gOS42MTAgZGUgMTkgZGUgZmV2ZXJlaXJvIGRlIDE5OTgsIGFydC4gMjksIGluY2lzbyBJSUksIGF1dG9yaXpvIGEgVW5pdmVyc2lkYWRlIEZlZGVyYWwgZGUgUGVybmFtYnVjbyBhIGRpc3BvbmliaWxpemFyIGdyYXR1aXRhbWVudGUsIHNlbSByZXNzYXJjaW1lbnRvIGRvcyBkaXJlaXRvcyBhdXRvcmFpcywgcGFyYSBmaW5zIGRlIGxlaXR1cmEsIGltcHJlc3PDo28gZS9vdSBkb3dubG9hZCAoYXF1aXNpw6fDo28pIGF0cmF2w6lzIGRvIHNpdGUgZG8gUmVwb3NpdMOzcmlvIERpZ2l0YWwgZGEgVUZQRSBubyBlbmRlcmXDp28gaHR0cDovL3d3dy5yZXBvc2l0b3Jpby51ZnBlLmJyLCBxdWFuZG8gZmluZGFyIG8gcGVyw61vZG8gZGUgZW1iYXJnbyBjb25kaXplbnRlIGFvIHRpcG8gZGUgZG9jdW1lbnRvLCBjb25mb3JtZSBpbmRpY2FkbyBubyBjYW1wbyBEYXRhIGRlIEVtYmFyZ28uCg==Repositório InstitucionalPUBhttps://repositorio.ufpe.br/oai/requestattena@ufpe.bropendoar:22212023-11-09T05:24:41Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE)false
dc.title.pt_BR.fl_str_mv Variable weighted fuzzy clustering algorithm for qualitative data
title Variable weighted fuzzy clustering algorithm for qualitative data
spellingShingle Variable weighted fuzzy clustering algorithm for qualitative data
TEOTONIO, Gabriel Harrison Fidelis
Inteligência computacional
Agrupamento
title_short Variable weighted fuzzy clustering algorithm for qualitative data
title_full Variable weighted fuzzy clustering algorithm for qualitative data
title_fullStr Variable weighted fuzzy clustering algorithm for qualitative data
title_full_unstemmed Variable weighted fuzzy clustering algorithm for qualitative data
title_sort Variable weighted fuzzy clustering algorithm for qualitative data
author TEOTONIO, Gabriel Harrison Fidelis
author_facet TEOTONIO, Gabriel Harrison Fidelis
author_role author
dc.contributor.authorLattes.pt_BR.fl_str_mv http://lattes.cnpq.br/3723910313293363
dc.contributor.advisorLattes.pt_BR.fl_str_mv http://lattes.cnpq.br/9289080285504453
dc.contributor.advisor-coLattes.pt_BR.fl_str_mv http://lattes.cnpq.br/7674916684282039
dc.contributor.author.fl_str_mv TEOTONIO, Gabriel Harrison Fidelis
dc.contributor.advisor1.fl_str_mv SOUZA, Renata Maria Cardoso Rodrigues de
dc.contributor.advisor-co1.fl_str_mv AMARAL, Getúlio José Amorim do
contributor_str_mv SOUZA, Renata Maria Cardoso Rodrigues de
AMARAL, Getúlio José Amorim do
dc.subject.por.fl_str_mv Inteligência computacional
Agrupamento
topic Inteligência computacional
Agrupamento
description This work focuses on the clustering methods within unsupervised learning, a challenging sub-division of Machine Learning where there is no response variable available. Clustering is a technique for finding groups in a dataset, where the observations in each group are similar to each other and different from those in other groups. The K-Means method, recognized as the most well-known and widely used clustering technique, efficiently handles quantitative variables, like many other existing clustering methods. However, the K-Means algorithm cannot be used with qualitative variables, such as gender or education level. To overcome this limitation, the K-Modes method was proposed, which uses modes instead of means to represent the clusters. The existing partitional clustering algorithms without variable weighting have a limitation in that they assign equal importance to all variables. It can be problematic when clustering high-dimensional, sparse data where the characterization of cluster partitions can be explained by a particular subset of variables. To address this issue, subspace clustering techniques and adaptive distances have been proposed, with the latter being derived from constraints based on the sum and product of the weights relative to the importance of the variables. This work proposes a new fuzzy clustering algorithm for qualitative data based on adaptive distances, which demonstrates improved performance compared to conventional methods. The local adaptive distances, which assign different weights to each variable across the clusters, perform better for datasets with high levels of dispersion and overlap of classes. The results extend the capabilities of existing clustering algorithms based on adaptive distances.
publishDate 2023
dc.date.accessioned.fl_str_mv 2023-11-08T17:37:34Z
dc.date.available.fl_str_mv 2023-11-08T17:37:34Z
dc.date.issued.fl_str_mv 2023-05-25
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.citation.fl_str_mv TEOTONIO, Gabriel Harrison Fidelis. Variable weighted fuzzy clustering algorithm for qualitative data. 2023. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de Pernambuco, Recife, 2023.
dc.identifier.uri.fl_str_mv https://repositorio.ufpe.br/handle/123456789/53504
dc.identifier.dark.fl_str_mv ark:/64986/0013000010gz0
identifier_str_mv TEOTONIO, Gabriel Harrison Fidelis. Variable weighted fuzzy clustering algorithm for qualitative data. 2023. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de Pernambuco, Recife, 2023.
ark:/64986/0013000010gz0
url https://repositorio.ufpe.br/handle/123456789/53504
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv Attribution-NonCommercial-NoDerivs 3.0 Brazil
http://creativecommons.org/licenses/by-nc-nd/3.0/br/
info:eu-repo/semantics/embargoedAccess
rights_invalid_str_mv Attribution-NonCommercial-NoDerivs 3.0 Brazil
http://creativecommons.org/licenses/by-nc-nd/3.0/br/
eu_rights_str_mv embargoedAccess
dc.publisher.none.fl_str_mv Universidade Federal de Pernambuco
dc.publisher.program.fl_str_mv Programa de Pos Graduacao em Ciencia da Computacao
dc.publisher.initials.fl_str_mv UFPE
dc.publisher.country.fl_str_mv Brasil
publisher.none.fl_str_mv Universidade Federal de Pernambuco
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFPE
instname:Universidade Federal de Pernambuco (UFPE)
instacron:UFPE
instname_str Universidade Federal de Pernambuco (UFPE)
instacron_str UFPE
institution UFPE
reponame_str Repositório Institucional da UFPE
collection Repositório Institucional da UFPE
bitstream.url.fl_str_mv https://repositorio.ufpe.br/bitstream/123456789/53504/3/license.txt
https://repositorio.ufpe.br/bitstream/123456789/53504/1/DISSERTA%c3%87AO%20Gabriel%20Harrison%20Fidelis%20Teotonio.pdf
https://repositorio.ufpe.br/bitstream/123456789/53504/2/license_rdf
https://repositorio.ufpe.br/bitstream/123456789/53504/4/DISSERTA%c3%87AO%20Gabriel%20Harrison%20Fidelis%20Teotonio.pdf.txt
https://repositorio.ufpe.br/bitstream/123456789/53504/5/DISSERTA%c3%87AO%20Gabriel%20Harrison%20Fidelis%20Teotonio.pdf.jpg
bitstream.checksum.fl_str_mv 5e89a1613ddc8510c6576f4b23a78973
9db1e0aab784f7835ec207d58bb55c9a
e39d27027a6cc9cb039ad269a5db8e34
7f766fb275a3fd223c214a02792b5abb
295f459b2ebe7ec8b5843ff8d11df8a4
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE)
repository.mail.fl_str_mv attena@ufpe.br
_version_ 1815172963303948288