Improving the efficiency of k-medoids algorithms using metric access methods

Teixeira, Larissa Roberta

Improving the efficiency of k-medoids algorithms using metric access methods

Detalhes bibliográficos
Autor(a) principal:	Teixeira, Larissa Roberta
Data de Publicação:	2024
Tipo de documento:	Dissertação
Idioma:	eng
Título da fonte:	Biblioteca Digital de Teses e Dissertações da USP
Texto Completo:	https://www.teses.usp.br/teses/disponiveis/55/55134/tde-27082024-144742/
Resumo:	In the dawn of computing, data processing techniques and tools were developed to deal with scalar data types. However, with technological advancements, there has been a significant growth in the amount and complexity of data. This surge has necessitated the development of techniques that can efficiently handle complex data types. Here we call as complex the data types that do not have a predefined way on how they must be compared, as is the case of comparisons involving similarity. Among the existing strategies in the literature, clustering techniques stand out as a promising approach for identifying patterns in data by forming groups. Within the realm of clustering algorithms, k-medoids-based methods have emerged as one of the most widely used approaches. However, these methods exhibit high computational costs when applied to large datasets. Despite numerous efforts in the literature to optimize k-medoids algorithms, they still face limitations when dealing with large datasets, in particular when these data are complex. This is primarily because they need to compute and store a distance matrix in memory, rendering them impractical for handling voluminous datasets. In this masters research, the KluSIM algorithm is proposed, a novel approach to enhance the computational efficiency of the swap step in k-medoids algorithms. KluSIM employs Access Methods to prune the search space, significantly accelerating the swap step. Additionally, KluSIM eliminates the need to maintain a distance matrix in main memory, effectively overcoming the memory limitations of existing methodologies. Overall, the experiments conducted demonstrate that KluSIM effectively optimizes the swap step k-medoids algorithms, substantially reducing the number of distance calculations required during the clustering process. Furthermore, KluSIM can be applied to big data tasks as it proves to be scalable and effective for clustering in the tested datasets.

Metadados do item

id	USP_0c7edc2b34a384ab2d76fb1db58fcddc
oai_identifier_str	oai:teses.usp.br:tde-27082024-144742
network_acronym_str	USP
network_name_str	Biblioteca Digital de Teses e Dissertações da USP
repository_id_str	2721
spelling	Improving the efficiency of k-medoids algorithms using metric access methodsMelhorando a eficiência dos algoritmos k-medoids usando métodos de acesso métricoAgrupamentoClusteringDados dimensionaisDimensional dataIndexaçãoIndexingk-medoidsk-medoidsMétodos de acesso métricoMetric access methodIn the dawn of computing, data processing techniques and tools were developed to deal with scalar data types. However, with technological advancements, there has been a significant growth in the amount and complexity of data. This surge has necessitated the development of techniques that can efficiently handle complex data types. Here we call as complex the data types that do not have a predefined way on how they must be compared, as is the case of comparisons involving similarity. Among the existing strategies in the literature, clustering techniques stand out as a promising approach for identifying patterns in data by forming groups. Within the realm of clustering algorithms, k-medoids-based methods have emerged as one of the most widely used approaches. However, these methods exhibit high computational costs when applied to large datasets. Despite numerous efforts in the literature to optimize k-medoids algorithms, they still face limitations when dealing with large datasets, in particular when these data are complex. This is primarily because they need to compute and store a distance matrix in memory, rendering them impractical for handling voluminous datasets. In this masters research, the KluSIM algorithm is proposed, a novel approach to enhance the computational efficiency of the swap step in k-medoids algorithms. KluSIM employs Access Methods to prune the search space, significantly accelerating the swap step. Additionally, KluSIM eliminates the need to maintain a distance matrix in main memory, effectively overcoming the memory limitations of existing methodologies. Overall, the experiments conducted demonstrate that KluSIM effectively optimizes the swap step k-medoids algorithms, substantially reducing the number of distance calculations required during the clustering process. Furthermore, KluSIM can be applied to big data tasks as it proves to be scalable and effective for clustering in the tested datasets.Inicialmente, as técnicas e ferramentas de processamento de dados foram desenvolvidas para lidar com tipos de dados escalares. Contudo, com o avanço tecnológico, houve um crescimento significativo na quantidade e complexidade dos dados. Assim, tornou-se necessário o desenvolvimento de técnicas que permitam a manipulação eficiente de tipos de dados complexos. Consideramos aqui como complexos os dados que não contam com uma definição predefinida sobre como devem ser comparados, como é o caso de comparações envolvendo similaridade. Entre as estratégias existentes na literatura, destaca-se a detecção de agrupamento que busca encontrar padrões nos dados através da criação de grupos. Na literatura, os algoritmos de agrupamento baseados em k-medoids destacam-se como uma das abordagens mais utilizadas. Entretanto, esses métodos possuem alto custo computacional quando aplicados em grandes conjuntos de dados. Embora muitos trabalhos na literatura busquem otimizar os algoritmos k-medoids, eles ainda enfrentam limitações quando aplicados a grandes conjuntos de dados, especialmente quando esses dados são complexos, uma vez que exigem o cálculo e armazenamento de uma matriz de distância em memória. Isso os torna inviáveis para lidar com tais conjuntos de dados. Nesta dissertação de mestrado, foi proposto um novo algoritmo que melhora a eficiência computacional da etapa de dos algoritmos k-medoids. O KluSIM utiliza Métodos de Acesso para podar o espaço de busca, acelerando a etapa de troca. Além disso, o KluSIM elimina a necessidade de manter uma matriz de distância em memória principal, superando efetivamente as limitações de memória encontradas nas metodologias existentes. No geral, os experimentos realizados mostram que o KluSIM contribui de maneira efetiva na otimização da etapa de troca dos algoritmos k-medoids, reduzindo significativamente o número de cálculos de distância necessárias durante o processo de agrupamento. O KluSIM pode ser aplicado em tarefas de big data, uma vez que mostrou-se ser escalável e eficaz para o agrupamento nos conjunto de dados nos testes executados.Biblioteca Digitais de Teses e Dissertações da USPTraina Junior, CaetanoTeixeira, Larissa Roberta2024-07-03info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/55/55134/tde-27082024-144742/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2024-08-27T17:55:02Zoai:teses.usp.br:tde-27082024-144742Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br\|\| atendimento@aguia.usp.br\|\|virginia@if.usp.bropendoar:27212024-08-27T17:55:02Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv	Improving the efficiency of k-medoids algorithms using metric access methods Melhorando a eficiência dos algoritmos k-medoids usando métodos de acesso métrico
title	Improving the efficiency of k-medoids algorithms using metric access methods
spellingShingle	Improving the efficiency of k-medoids algorithms using metric access methods Teixeira, Larissa Roberta Agrupamento Clustering Dados dimensionais Dimensional data Indexação Indexing k-medoids k-medoids Métodos de acesso métrico Metric access method
title_short	Improving the efficiency of k-medoids algorithms using metric access methods
title_full	Improving the efficiency of k-medoids algorithms using metric access methods
title_fullStr	Improving the efficiency of k-medoids algorithms using metric access methods
title_full_unstemmed	Improving the efficiency of k-medoids algorithms using metric access methods
title_sort	Improving the efficiency of k-medoids algorithms using metric access methods
author	Teixeira, Larissa Roberta
author_facet	Teixeira, Larissa Roberta
author_role	author
dc.contributor.none.fl_str_mv	Traina Junior, Caetano
dc.contributor.author.fl_str_mv	Teixeira, Larissa Roberta
dc.subject.por.fl_str_mv	Agrupamento Clustering Dados dimensionais Dimensional data Indexação Indexing k-medoids k-medoids Métodos de acesso métrico Metric access method
topic	Agrupamento Clustering Dados dimensionais Dimensional data Indexação Indexing k-medoids k-medoids Métodos de acesso métrico Metric access method
description	In the dawn of computing, data processing techniques and tools were developed to deal with scalar data types. However, with technological advancements, there has been a significant growth in the amount and complexity of data. This surge has necessitated the development of techniques that can efficiently handle complex data types. Here we call as complex the data types that do not have a predefined way on how they must be compared, as is the case of comparisons involving similarity. Among the existing strategies in the literature, clustering techniques stand out as a promising approach for identifying patterns in data by forming groups. Within the realm of clustering algorithms, k-medoids-based methods have emerged as one of the most widely used approaches. However, these methods exhibit high computational costs when applied to large datasets. Despite numerous efforts in the literature to optimize k-medoids algorithms, they still face limitations when dealing with large datasets, in particular when these data are complex. This is primarily because they need to compute and store a distance matrix in memory, rendering them impractical for handling voluminous datasets. In this masters research, the KluSIM algorithm is proposed, a novel approach to enhance the computational efficiency of the swap step in k-medoids algorithms. KluSIM employs Access Methods to prune the search space, significantly accelerating the swap step. Additionally, KluSIM eliminates the need to maintain a distance matrix in main memory, effectively overcoming the memory limitations of existing methodologies. Overall, the experiments conducted demonstrate that KluSIM effectively optimizes the swap step k-medoids algorithms, substantially reducing the number of distance calculations required during the clustering process. Furthermore, KluSIM can be applied to big data tasks as it proves to be scalable and effective for clustering in the tested datasets.
publishDate	2024
dc.date.none.fl_str_mv	2024-07-03
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://www.teses.usp.br/teses/disponiveis/55/55134/tde-27082024-144742/
url	https://www.teses.usp.br/teses/disponiveis/55/55134/tde-27082024-144742/
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv
dc.rights.driver.fl_str_mv	Liberar o conteúdo para acesso público. info:eu-repo/semantics/openAccess
rights_invalid_str_mv	Liberar o conteúdo para acesso público.
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.coverage.none.fl_str_mv
dc.publisher.none.fl_str_mv	Biblioteca Digitais de Teses e Dissertações da USP
publisher.none.fl_str_mv	Biblioteca Digitais de Teses e Dissertações da USP
dc.source.none.fl_str_mv	reponame:Biblioteca Digital de Teses e Dissertações da USP instname:Universidade de São Paulo (USP) instacron:USP
instname_str	Universidade de São Paulo (USP)
instacron_str	USP
institution	USP
reponame_str	Biblioteca Digital de Teses e Dissertações da USP
collection	Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv	Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv	virginia@if.usp.br\|\| atendimento@aguia.usp.br\|\|virginia@if.usp.br
_version_	1809091130539638784

Improving the efficiency of k-medoids algorithms using metric access methods

Registros relacionados