Improving the efficiency of k-medoids algorithms using metric access methods

Detalhes bibliográficos
Autor(a) principal: Teixeira, Larissa Roberta
Data de Publicação: 2024
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Biblioteca Digital de Teses e Dissertações da USP
Texto Completo: https://www.teses.usp.br/teses/disponiveis/55/55134/tde-27082024-144742/
Resumo: In the dawn of computing, data processing techniques and tools were developed to deal with scalar data types. However, with technological advancements, there has been a significant growth in the amount and complexity of data. This surge has necessitated the development of techniques that can efficiently handle complex data types. Here we call as complex the data types that do not have a predefined way on how they must be compared, as is the case of comparisons involving similarity. Among the existing strategies in the literature, clustering techniques stand out as a promising approach for identifying patterns in data by forming groups. Within the realm of clustering algorithms, k-medoids-based methods have emerged as one of the most widely used approaches. However, these methods exhibit high computational costs when applied to large datasets. Despite numerous efforts in the literature to optimize k-medoids algorithms, they still face limitations when dealing with large datasets, in particular when these data are complex. This is primarily because they need to compute and store a distance matrix in memory, rendering them impractical for handling voluminous datasets. In this masters research, the KluSIM algorithm is proposed, a novel approach to enhance the computational efficiency of the swap step in k-medoids algorithms. KluSIM employs Access Methods to prune the search space, significantly accelerating the swap step. Additionally, KluSIM eliminates the need to maintain a distance matrix in main memory, effectively overcoming the memory limitations of existing methodologies. Overall, the experiments conducted demonstrate that KluSIM effectively optimizes the swap step k-medoids algorithms, substantially reducing the number of distance calculations required during the clustering process. Furthermore, KluSIM can be applied to big data tasks as it proves to be scalable and effective for clustering in the tested datasets.
id USP_0c7edc2b34a384ab2d76fb1db58fcddc
oai_identifier_str oai:teses.usp.br:tde-27082024-144742
network_acronym_str USP
network_name_str Biblioteca Digital de Teses e Dissertações da USP
repository_id_str 2721
spelling Improving the efficiency of k-medoids algorithms using metric access methodsMelhorando a eficiência dos algoritmos k-medoids usando métodos de acesso métricoAgrupamentoClusteringDados dimensionaisDimensional dataIndexaçãoIndexingk-medoidsk-medoidsMétodos de acesso métricoMetric access methodIn the dawn of computing, data processing techniques and tools were developed to deal with scalar data types. However, with technological advancements, there has been a significant growth in the amount and complexity of data. This surge has necessitated the development of techniques that can efficiently handle complex data types. Here we call as complex the data types that do not have a predefined way on how they must be compared, as is the case of comparisons involving similarity. Among the existing strategies in the literature, clustering techniques stand out as a promising approach for identifying patterns in data by forming groups. Within the realm of clustering algorithms, k-medoids-based methods have emerged as one of the most widely used approaches. However, these methods exhibit high computational costs when applied to large datasets. Despite numerous efforts in the literature to optimize k-medoids algorithms, they still face limitations when dealing with large datasets, in particular when these data are complex. This is primarily because they need to compute and store a distance matrix in memory, rendering them impractical for handling voluminous datasets. In this masters research, the KluSIM algorithm is proposed, a novel approach to enhance the computational efficiency of the swap step in k-medoids algorithms. KluSIM employs Access Methods to prune the search space, significantly accelerating the swap step. Additionally, KluSIM eliminates the need to maintain a distance matrix in main memory, effectively overcoming the memory limitations of existing methodologies. Overall, the experiments conducted demonstrate that KluSIM effectively optimizes the swap step k-medoids algorithms, substantially reducing the number of distance calculations required during the clustering process. Furthermore, KluSIM can be applied to big data tasks as it proves to be scalable and effective for clustering in the tested datasets.Inicialmente, as técnicas e ferramentas de processamento de dados foram desenvolvidas para lidar com tipos de dados escalares. Contudo, com o avanço tecnológico, houve um crescimento significativo na quantidade e complexidade dos dados. Assim, tornou-se necessário o desenvolvimento de técnicas que permitam a manipulação eficiente de tipos de dados complexos. Consideramos aqui como complexos os dados que não contam com uma definição predefinida sobre como devem ser comparados, como é o caso de comparações envolvendo similaridade. Entre as estratégias existentes na literatura, destaca-se a detecção de agrupamento que busca encontrar padrões nos dados através da criação de grupos. Na literatura, os algoritmos de agrupamento baseados em k-medoids destacam-se como uma das abordagens mais utilizadas. Entretanto, esses métodos possuem alto custo computacional quando aplicados em grandes conjuntos de dados. Embora muitos trabalhos na literatura busquem otimizar os algoritmos k-medoids, eles ainda enfrentam limitações quando aplicados a grandes conjuntos de dados, especialmente quando esses dados são complexos, uma vez que exigem o cálculo e armazenamento de uma matriz de distância em memória. Isso os torna inviáveis para lidar com tais conjuntos de dados. Nesta dissertação de mestrado, foi proposto um novo algoritmo que melhora a eficiência computacional da etapa de dos algoritmos k-medoids. O KluSIM utiliza Métodos de Acesso para podar o espaço de busca, acelerando a etapa de troca. Além disso, o KluSIM elimina a necessidade de manter uma matriz de distância em memória principal, superando efetivamente as limitações de memória encontradas nas metodologias existentes. No geral, os experimentos realizados mostram que o KluSIM contribui de maneira efetiva na otimização da etapa de troca dos algoritmos k-medoids, reduzindo significativamente o número de cálculos de distância necessárias durante o processo de agrupamento. O KluSIM pode ser aplicado em tarefas de big data, uma vez que mostrou-se ser escalável e eficaz para o agrupamento nos conjunto de dados nos testes executados.Biblioteca Digitais de Teses e Dissertações da USPTraina Junior, CaetanoTeixeira, Larissa Roberta2024-07-03info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/55/55134/tde-27082024-144742/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2024-08-27T17:55:02Zoai:teses.usp.br:tde-27082024-144742Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212024-08-27T17:55:02Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv Improving the efficiency of k-medoids algorithms using metric access methods
Melhorando a eficiência dos algoritmos k-medoids usando métodos de acesso métrico
title Improving the efficiency of k-medoids algorithms using metric access methods
spellingShingle Improving the efficiency of k-medoids algorithms using metric access methods
Teixeira, Larissa Roberta
Agrupamento
Clustering
Dados dimensionais
Dimensional data
Indexação
Indexing
k-medoids
k-medoids
Métodos de acesso métrico
Metric access method
title_short Improving the efficiency of k-medoids algorithms using metric access methods
title_full Improving the efficiency of k-medoids algorithms using metric access methods
title_fullStr Improving the efficiency of k-medoids algorithms using metric access methods
title_full_unstemmed Improving the efficiency of k-medoids algorithms using metric access methods
title_sort Improving the efficiency of k-medoids algorithms using metric access methods
author Teixeira, Larissa Roberta
author_facet Teixeira, Larissa Roberta
author_role author
dc.contributor.none.fl_str_mv Traina Junior, Caetano
dc.contributor.author.fl_str_mv Teixeira, Larissa Roberta
dc.subject.por.fl_str_mv Agrupamento
Clustering
Dados dimensionais
Dimensional data
Indexação
Indexing
k-medoids
k-medoids
Métodos de acesso métrico
Metric access method
topic Agrupamento
Clustering
Dados dimensionais
Dimensional data
Indexação
Indexing
k-medoids
k-medoids
Métodos de acesso métrico
Metric access method
description In the dawn of computing, data processing techniques and tools were developed to deal with scalar data types. However, with technological advancements, there has been a significant growth in the amount and complexity of data. This surge has necessitated the development of techniques that can efficiently handle complex data types. Here we call as complex the data types that do not have a predefined way on how they must be compared, as is the case of comparisons involving similarity. Among the existing strategies in the literature, clustering techniques stand out as a promising approach for identifying patterns in data by forming groups. Within the realm of clustering algorithms, k-medoids-based methods have emerged as one of the most widely used approaches. However, these methods exhibit high computational costs when applied to large datasets. Despite numerous efforts in the literature to optimize k-medoids algorithms, they still face limitations when dealing with large datasets, in particular when these data are complex. This is primarily because they need to compute and store a distance matrix in memory, rendering them impractical for handling voluminous datasets. In this masters research, the KluSIM algorithm is proposed, a novel approach to enhance the computational efficiency of the swap step in k-medoids algorithms. KluSIM employs Access Methods to prune the search space, significantly accelerating the swap step. Additionally, KluSIM eliminates the need to maintain a distance matrix in main memory, effectively overcoming the memory limitations of existing methodologies. Overall, the experiments conducted demonstrate that KluSIM effectively optimizes the swap step k-medoids algorithms, substantially reducing the number of distance calculations required during the clustering process. Furthermore, KluSIM can be applied to big data tasks as it proves to be scalable and effective for clustering in the tested datasets.
publishDate 2024
dc.date.none.fl_str_mv 2024-07-03
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://www.teses.usp.br/teses/disponiveis/55/55134/tde-27082024-144742/
url https://www.teses.usp.br/teses/disponiveis/55/55134/tde-27082024-144742/
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv
dc.rights.driver.fl_str_mv Liberar o conteúdo para acesso público.
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Liberar o conteúdo para acesso público.
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.coverage.none.fl_str_mv
dc.publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
dc.source.none.fl_str_mv
reponame:Biblioteca Digital de Teses e Dissertações da USP
instname:Universidade de São Paulo (USP)
instacron:USP
instname_str Universidade de São Paulo (USP)
instacron_str USP
institution USP
reponame_str Biblioteca Digital de Teses e Dissertações da USP
collection Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br
_version_ 1809091130539638784