Clustering algorithms with new automatic variables weighting
Autor(a) principal: | |
---|---|
Data de Publicação: | 2022 |
Tipo de documento: | Tese |
Idioma: | eng |
Título da fonte: | Repositório Institucional da UFPE |
Texto Completo: | https://repositorio.ufpe.br/handle/123456789/44859 |
Resumo: | Every day a large amount of information is stored or represented as data for further analysis and management. Data analysis plays an indispensable role in understanding different phenomena. One of the vital means of handling these data is to classify or group them into a set of categories or clusters. Clustering or cluster analysis aims to divide a collection of data items into clusters given a measure of similarity. Clustering has been used in various fields, such as image processing, data mining, pattern recognition, and statistical analysis. Usually, clustering methods deal with objects described by real-valued variables. Nevertheless, this representation is too restrictive for representing complex data, such as lists, histograms, or even intervals. Furthermore, in some problems, many dimensions are irrelevant and can mask existing clusters, e.g., groups may exist in different subsets of features. This work focuses on the clustering analysis of data points described by both real-valued and interval-valued variables. In this regard, new clustering algorithms have been proposed, in which the correlation and relevance of variables are considered to improve their performance. In the case of interval- valued data, we assume that the boundaries of the interval-valued variables have the same and different importance for the clustering process. Since regularization-based methods are robust for initializations, the proposed approaches introduce a regularization term for controlling the membership degree of the objects. Such regularizations are popular due to high performance in large-scale data clustering and low computational complexity. These three-step iterative algorithms provide a fuzzy partition, a representative for each cluster, and the relevance weight of the variables or their correlation by minimizing a suitable objective function. Experiments on synthetic and real datasets corroborate the robustness and usefulness of the proposed clustering methods. |
id |
UFPE_ce494f57753cc00d1d6ab9c7e5c338d4 |
---|---|
oai_identifier_str |
oai:repositorio.ufpe.br:123456789/44859 |
network_acronym_str |
UFPE |
network_name_str |
Repositório Institucional da UFPE |
repository_id_str |
2221 |
spelling |
RIZO RODRÍGUEZ, Sara Inéshttp://lattes.cnpq.br/5082535257923332http://lattes.cnpq.br/3909162572623711CARVALHO, Francisco de Assis Tenório de2022-06-27T11:47:22Z2022-06-27T11:47:22Z2022-02-21RIZO RODRÍGUEZ, Sara Inés. Clustering algorithms with new automatic variables weighting. 2022. Tese (Doutorado em Ciência da Computação) - Universidade Federal de Pernambuco, Recife, 2022.https://repositorio.ufpe.br/handle/123456789/44859Every day a large amount of information is stored or represented as data for further analysis and management. Data analysis plays an indispensable role in understanding different phenomena. One of the vital means of handling these data is to classify or group them into a set of categories or clusters. Clustering or cluster analysis aims to divide a collection of data items into clusters given a measure of similarity. Clustering has been used in various fields, such as image processing, data mining, pattern recognition, and statistical analysis. Usually, clustering methods deal with objects described by real-valued variables. Nevertheless, this representation is too restrictive for representing complex data, such as lists, histograms, or even intervals. Furthermore, in some problems, many dimensions are irrelevant and can mask existing clusters, e.g., groups may exist in different subsets of features. This work focuses on the clustering analysis of data points described by both real-valued and interval-valued variables. In this regard, new clustering algorithms have been proposed, in which the correlation and relevance of variables are considered to improve their performance. In the case of interval- valued data, we assume that the boundaries of the interval-valued variables have the same and different importance for the clustering process. Since regularization-based methods are robust for initializations, the proposed approaches introduce a regularization term for controlling the membership degree of the objects. Such regularizations are popular due to high performance in large-scale data clustering and low computational complexity. These three-step iterative algorithms provide a fuzzy partition, a representative for each cluster, and the relevance weight of the variables or their correlation by minimizing a suitable objective function. Experiments on synthetic and real datasets corroborate the robustness and usefulness of the proposed clustering methods.FACEPETodos os dias, uma grande quantidade de informações é armazenada ou representada como dados para posterior análise e gerenciamento. A análise de dados desempenha um papel indispensável na compreensão de diferentes fenômenos. Um dos meios vitais de lidar com esses dados é classificá-los ou agrupá-los em um conjunto de categorias ou grupos. O agrupamento ou análise de agrupamento visa dividir uma coleção de itens de dados em grupos, dada uma me- dida de similaridade. O agrupamento tem sido usado em vários campos, como processamento de imagens, mineração de dados, reconhecimento de padrões e análise estatística. Geralmente, os métodos de agrupamento lidam com objetos descritos por variáveis de valor real. No en- tanto, essa representação é muito restritiva para representar dados complexos, como listas, histogramas ou mesmo intervalos. Além disso, em alguns problemas, muitas dimensões são irrelevantes e podem mascarar os grupos existentes, por exemplo, os grupos podem existir em diferentes subconjuntos das variáveis. Este trabalho enfoca a análise de agrupamento de dados descritos por variáveis de valor real e de valor de intervalo. Nesse sentido, novos algoritmos de agrupamento foram propostos, nos quais a correlação e a relevância das variáveis são conside- radas para melhorar o desempenho. No caso de dados com valor de intervalo, assumimos que a importância dos limites das variáveis com valor de intervalo pode ser a mesma ou pode ser diferente para o processo de agrupamento. Como os métodos baseados em regularização são robustos à inicializações, as abordagens propostas introduzem um termo de regularização para controlar o grau de pertinência dos objetos aos grupos. Essas regularizações são populares devido ao alto desempenho no agrupamento de dados em grande escala e baixa complexidade computacional. Esses algoritmos iterativos de três etapas fornecem uma partição difusa, um representante para cada grupo, e o peso de relevância das variáveis ou sua correlação, mini- mizando uma função objetivo adequada. Experimentos com conjuntos de dados sintéticos e reais corroboram a robustez e utilidade dos métodos de agrupamento propostos.engUniversidade Federal de PernambucoPrograma de Pos Graduacao em Ciencia da ComputacaoUFPEBrasilhttp://creativecommons.org/licenses/by-nc-nd/3.0/br/info:eu-repo/semantics/openAccessInteligência computacionalAgrupamentoClustering algorithms with new automatic variables weightinginfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisdoutoradoreponame:Repositório Institucional da UFPEinstname:Universidade Federal de Pernambuco (UFPE)instacron:UFPEORIGINALTESE Sara Inés Rizo Rodríguez.pdfTESE Sara Inés Rizo Rodríguez.pdfapplication/pdf4856757https://repositorio.ufpe.br/bitstream/123456789/44859/1/TESE%20Sara%20In%c3%a9s%20Rizo%20Rodr%c3%adguez.pdfd53e47110ebd8c29aee4261168e0cefcMD51LICENSElicense.txtlicense.txttext/plain; charset=utf-82142https://repositorio.ufpe.br/bitstream/123456789/44859/3/license.txt6928b9260b07fb2755249a5ca9903395MD53CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8811https://repositorio.ufpe.br/bitstream/123456789/44859/2/license_rdfe39d27027a6cc9cb039ad269a5db8e34MD52TEXTTESE Sara Inés Rizo Rodríguez.pdf.txtTESE Sara Inés Rizo Rodríguez.pdf.txtExtracted texttext/plain288188https://repositorio.ufpe.br/bitstream/123456789/44859/4/TESE%20Sara%20In%c3%a9s%20Rizo%20Rodr%c3%adguez.pdf.txtf7cb02eaa651f40dd028884345c52216MD54THUMBNAILTESE Sara Inés Rizo Rodríguez.pdf.jpgTESE Sara Inés Rizo Rodríguez.pdf.jpgGenerated Thumbnailimage/jpeg1149https://repositorio.ufpe.br/bitstream/123456789/44859/5/TESE%20Sara%20In%c3%a9s%20Rizo%20Rodr%c3%adguez.pdf.jpg34dc9c9b1febe00bbd92e5e224eb4418MD55123456789/448592022-06-28 02:22:05.981oai:repositorio.ufpe.br:123456789/44859VGVybW8gZGUgRGVww7NzaXRvIExlZ2FsIGUgQXV0b3JpemHDp8OjbyBwYXJhIFB1YmxpY2HDp8OjbyBkZSBEb2N1bWVudG9zIG5vIFJlcG9zaXTDs3JpbyBEaWdpdGFsIGRhIFVGUEUKIAoKRGVjbGFybyBlc3RhciBjaWVudGUgZGUgcXVlIGVzdGUgVGVybW8gZGUgRGVww7NzaXRvIExlZ2FsIGUgQXV0b3JpemHDp8OjbyB0ZW0gbyBvYmpldGl2byBkZSBkaXZ1bGdhw6fDo28gZG9zIGRvY3VtZW50b3MgZGVwb3NpdGFkb3Mgbm8gUmVwb3NpdMOzcmlvIERpZ2l0YWwgZGEgVUZQRSBlIGRlY2xhcm8gcXVlOgoKSSAtICBvIGNvbnRlw7pkbyBkaXNwb25pYmlsaXphZG8gw6kgZGUgcmVzcG9uc2FiaWxpZGFkZSBkZSBzdWEgYXV0b3JpYTsKCklJIC0gbyBjb250ZcO6ZG8gw6kgb3JpZ2luYWwsIGUgc2UgbyB0cmFiYWxobyBlL291IHBhbGF2cmFzIGRlIG91dHJhcyBwZXNzb2FzIGZvcmFtIHV0aWxpemFkb3MsIGVzdGFzIGZvcmFtIGRldmlkYW1lbnRlIHJlY29uaGVjaWRhczsKCklJSSAtIHF1YW5kbyB0cmF0YXItc2UgZGUgVHJhYmFsaG8gZGUgQ29uY2x1c8OjbyBkZSBDdXJzbywgRGlzc2VydGHDp8OjbyBvdSBUZXNlOiBvIGFycXVpdm8gZGVwb3NpdGFkbyBjb3JyZXNwb25kZSDDoCB2ZXJzw6NvIGZpbmFsIGRvIHRyYWJhbGhvOwoKSVYgLSBxdWFuZG8gdHJhdGFyLXNlIGRlIFRyYWJhbGhvIGRlIENvbmNsdXPDo28gZGUgQ3Vyc28sIERpc3NlcnRhw6fDo28gb3UgVGVzZTogZXN0b3UgY2llbnRlIGRlIHF1ZSBhIGFsdGVyYcOnw6NvIGRhIG1vZGFsaWRhZGUgZGUgYWNlc3NvIGFvIGRvY3VtZW50byBhcMOzcyBvIGRlcMOzc2l0byBlIGFudGVzIGRlIGZpbmRhciBvIHBlcsOtb2RvIGRlIGVtYmFyZ28sIHF1YW5kbyBmb3IgZXNjb2xoaWRvIGFjZXNzbyByZXN0cml0bywgc2Vyw6EgcGVybWl0aWRhIG1lZGlhbnRlIHNvbGljaXRhw6fDo28gZG8gKGEpIGF1dG9yIChhKSBhbyBTaXN0ZW1hIEludGVncmFkbyBkZSBCaWJsaW90ZWNhcyBkYSBVRlBFIChTSUIvVUZQRSkuCgogClBhcmEgdHJhYmFsaG9zIGVtIEFjZXNzbyBBYmVydG86CgpOYSBxdWFsaWRhZGUgZGUgdGl0dWxhciBkb3MgZGlyZWl0b3MgYXV0b3JhaXMgZGUgYXV0b3IgcXVlIHJlY2FlbSBzb2JyZSBlc3RlIGRvY3VtZW50bywgZnVuZGFtZW50YWRvIG5hIExlaSBkZSBEaXJlaXRvIEF1dG9yYWwgbm8gOS42MTAsIGRlIDE5IGRlIGZldmVyZWlybyBkZSAxOTk4LCBhcnQuIDI5LCBpbmNpc28gSUlJLCBhdXRvcml6byBhIFVuaXZlcnNpZGFkZSBGZWRlcmFsIGRlIFBlcm5hbWJ1Y28gYSBkaXNwb25pYmlsaXphciBncmF0dWl0YW1lbnRlLCBzZW0gcmVzc2FyY2ltZW50byBkb3MgZGlyZWl0b3MgYXV0b3JhaXMsIHBhcmEgZmlucyBkZSBsZWl0dXJhLCBpbXByZXNzw6NvIGUvb3UgZG93bmxvYWQgKGFxdWlzacOnw6NvKSBhdHJhdsOpcyBkbyBzaXRlIGRvIFJlcG9zaXTDs3JpbyBEaWdpdGFsIGRhIFVGUEUgbm8gZW5kZXJlw6dvIGh0dHA6Ly93d3cucmVwb3NpdG9yaW8udWZwZS5iciwgYSBwYXJ0aXIgZGEgZGF0YSBkZSBkZXDDs3NpdG8uCgogClBhcmEgdHJhYmFsaG9zIGVtIEFjZXNzbyBSZXN0cml0bzoKCk5hIHF1YWxpZGFkZSBkZSB0aXR1bGFyIGRvcyBkaXJlaXRvcyBhdXRvcmFpcyBkZSBhdXRvciBxdWUgcmVjYWVtIHNvYnJlIGVzdGUgZG9jdW1lbnRvLCBmdW5kYW1lbnRhZG8gbmEgTGVpIGRlIERpcmVpdG8gQXV0b3JhbCBubyA5LjYxMCBkZSAxOSBkZSBmZXZlcmVpcm8gZGUgMTk5OCwgYXJ0LiAyOSwgaW5jaXNvIElJSSwgYXV0b3Jpem8gYSBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkZSBQZXJuYW1idWNvIGEgZGlzcG9uaWJpbGl6YXIgZ3JhdHVpdGFtZW50ZSwgc2VtIHJlc3NhcmNpbWVudG8gZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCBwYXJhIGZpbnMgZGUgbGVpdHVyYSwgaW1wcmVzc8OjbyBlL291IGRvd25sb2FkIChhcXVpc2nDp8OjbykgYXRyYXbDqXMgZG8gc2l0ZSBkbyBSZXBvc2l0w7NyaW8gRGlnaXRhbCBkYSBVRlBFIG5vIGVuZGVyZcOnbyBodHRwOi8vd3d3LnJlcG9zaXRvcmlvLnVmcGUuYnIsIHF1YW5kbyBmaW5kYXIgbyBwZXLDrW9kbyBkZSBlbWJhcmdvIGNvbmRpemVudGUgYW8gdGlwbyBkZSBkb2N1bWVudG8sIGNvbmZvcm1lIGluZGljYWRvIG5vIGNhbXBvIERhdGEgZGUgRW1iYXJnby4KRepositório InstitucionalPUBhttps://repositorio.ufpe.br/oai/requestattena@ufpe.bropendoar:22212022-06-28T05:22:05Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE)false |
dc.title.pt_BR.fl_str_mv |
Clustering algorithms with new automatic variables weighting |
title |
Clustering algorithms with new automatic variables weighting |
spellingShingle |
Clustering algorithms with new automatic variables weighting RIZO RODRÍGUEZ, Sara Inés Inteligência computacional Agrupamento |
title_short |
Clustering algorithms with new automatic variables weighting |
title_full |
Clustering algorithms with new automatic variables weighting |
title_fullStr |
Clustering algorithms with new automatic variables weighting |
title_full_unstemmed |
Clustering algorithms with new automatic variables weighting |
title_sort |
Clustering algorithms with new automatic variables weighting |
author |
RIZO RODRÍGUEZ, Sara Inés |
author_facet |
RIZO RODRÍGUEZ, Sara Inés |
author_role |
author |
dc.contributor.authorLattes.pt_BR.fl_str_mv |
http://lattes.cnpq.br/5082535257923332 |
dc.contributor.advisorLattes.pt_BR.fl_str_mv |
http://lattes.cnpq.br/3909162572623711 |
dc.contributor.author.fl_str_mv |
RIZO RODRÍGUEZ, Sara Inés |
dc.contributor.advisor1.fl_str_mv |
CARVALHO, Francisco de Assis Tenório de |
contributor_str_mv |
CARVALHO, Francisco de Assis Tenório de |
dc.subject.por.fl_str_mv |
Inteligência computacional Agrupamento |
topic |
Inteligência computacional Agrupamento |
description |
Every day a large amount of information is stored or represented as data for further analysis and management. Data analysis plays an indispensable role in understanding different phenomena. One of the vital means of handling these data is to classify or group them into a set of categories or clusters. Clustering or cluster analysis aims to divide a collection of data items into clusters given a measure of similarity. Clustering has been used in various fields, such as image processing, data mining, pattern recognition, and statistical analysis. Usually, clustering methods deal with objects described by real-valued variables. Nevertheless, this representation is too restrictive for representing complex data, such as lists, histograms, or even intervals. Furthermore, in some problems, many dimensions are irrelevant and can mask existing clusters, e.g., groups may exist in different subsets of features. This work focuses on the clustering analysis of data points described by both real-valued and interval-valued variables. In this regard, new clustering algorithms have been proposed, in which the correlation and relevance of variables are considered to improve their performance. In the case of interval- valued data, we assume that the boundaries of the interval-valued variables have the same and different importance for the clustering process. Since regularization-based methods are robust for initializations, the proposed approaches introduce a regularization term for controlling the membership degree of the objects. Such regularizations are popular due to high performance in large-scale data clustering and low computational complexity. These three-step iterative algorithms provide a fuzzy partition, a representative for each cluster, and the relevance weight of the variables or their correlation by minimizing a suitable objective function. Experiments on synthetic and real datasets corroborate the robustness and usefulness of the proposed clustering methods. |
publishDate |
2022 |
dc.date.accessioned.fl_str_mv |
2022-06-27T11:47:22Z |
dc.date.available.fl_str_mv |
2022-06-27T11:47:22Z |
dc.date.issued.fl_str_mv |
2022-02-21 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/doctoralThesis |
format |
doctoralThesis |
status_str |
publishedVersion |
dc.identifier.citation.fl_str_mv |
RIZO RODRÍGUEZ, Sara Inés. Clustering algorithms with new automatic variables weighting. 2022. Tese (Doutorado em Ciência da Computação) - Universidade Federal de Pernambuco, Recife, 2022. |
dc.identifier.uri.fl_str_mv |
https://repositorio.ufpe.br/handle/123456789/44859 |
identifier_str_mv |
RIZO RODRÍGUEZ, Sara Inés. Clustering algorithms with new automatic variables weighting. 2022. Tese (Doutorado em Ciência da Computação) - Universidade Federal de Pernambuco, Recife, 2022. |
url |
https://repositorio.ufpe.br/handle/123456789/44859 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
http://creativecommons.org/licenses/by-nc-nd/3.0/br/ info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
http://creativecommons.org/licenses/by-nc-nd/3.0/br/ |
eu_rights_str_mv |
openAccess |
dc.publisher.none.fl_str_mv |
Universidade Federal de Pernambuco |
dc.publisher.program.fl_str_mv |
Programa de Pos Graduacao em Ciencia da Computacao |
dc.publisher.initials.fl_str_mv |
UFPE |
dc.publisher.country.fl_str_mv |
Brasil |
publisher.none.fl_str_mv |
Universidade Federal de Pernambuco |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFPE instname:Universidade Federal de Pernambuco (UFPE) instacron:UFPE |
instname_str |
Universidade Federal de Pernambuco (UFPE) |
instacron_str |
UFPE |
institution |
UFPE |
reponame_str |
Repositório Institucional da UFPE |
collection |
Repositório Institucional da UFPE |
bitstream.url.fl_str_mv |
https://repositorio.ufpe.br/bitstream/123456789/44859/1/TESE%20Sara%20In%c3%a9s%20Rizo%20Rodr%c3%adguez.pdf https://repositorio.ufpe.br/bitstream/123456789/44859/3/license.txt https://repositorio.ufpe.br/bitstream/123456789/44859/2/license_rdf https://repositorio.ufpe.br/bitstream/123456789/44859/4/TESE%20Sara%20In%c3%a9s%20Rizo%20Rodr%c3%adguez.pdf.txt https://repositorio.ufpe.br/bitstream/123456789/44859/5/TESE%20Sara%20In%c3%a9s%20Rizo%20Rodr%c3%adguez.pdf.jpg |
bitstream.checksum.fl_str_mv |
d53e47110ebd8c29aee4261168e0cefc 6928b9260b07fb2755249a5ca9903395 e39d27027a6cc9cb039ad269a5db8e34 f7cb02eaa651f40dd028884345c52216 34dc9c9b1febe00bbd92e5e224eb4418 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 MD5 MD5 |
repository.name.fl_str_mv |
Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE) |
repository.mail.fl_str_mv |
attena@ufpe.br |
_version_ |
1802310759851491328 |