Improving the efficiency of k-medoids algorithms using metric access methods
Autor(a) principal: | |
---|---|
Data de Publicação: | 2024 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Biblioteca Digital de Teses e Dissertações da USP |
Texto Completo: | https://www.teses.usp.br/teses/disponiveis/55/55134/tde-27082024-144742/ |
Resumo: | In the dawn of computing, data processing techniques and tools were developed to deal with scalar data types. However, with technological advancements, there has been a significant growth in the amount and complexity of data. This surge has necessitated the development of techniques that can efficiently handle complex data types. Here we call as complex the data types that do not have a predefined way on how they must be compared, as is the case of comparisons involving similarity. Among the existing strategies in the literature, clustering techniques stand out as a promising approach for identifying patterns in data by forming groups. Within the realm of clustering algorithms, k-medoids-based methods have emerged as one of the most widely used approaches. However, these methods exhibit high computational costs when applied to large datasets. Despite numerous efforts in the literature to optimize k-medoids algorithms, they still face limitations when dealing with large datasets, in particular when these data are complex. This is primarily because they need to compute and store a distance matrix in memory, rendering them impractical for handling voluminous datasets. In this masters research, the KluSIM algorithm is proposed, a novel approach to enhance the computational efficiency of the swap step in k-medoids algorithms. KluSIM employs Access Methods to prune the search space, significantly accelerating the swap step. Additionally, KluSIM eliminates the need to maintain a distance matrix in main memory, effectively overcoming the memory limitations of existing methodologies. Overall, the experiments conducted demonstrate that KluSIM effectively optimizes the swap step k-medoids algorithms, substantially reducing the number of distance calculations required during the clustering process. Furthermore, KluSIM can be applied to big data tasks as it proves to be scalable and effective for clustering in the tested datasets. |
id |
USP_0c7edc2b34a384ab2d76fb1db58fcddc |
---|---|
oai_identifier_str |
oai:teses.usp.br:tde-27082024-144742 |
network_acronym_str |
USP |
network_name_str |
Biblioteca Digital de Teses e Dissertações da USP |
repository_id_str |
2721 |
spelling |
Improving the efficiency of k-medoids algorithms using metric access methodsMelhorando a eficiência dos algoritmos k-medoids usando métodos de acesso métricoAgrupamentoClusteringDados dimensionaisDimensional dataIndexaçãoIndexingk-medoidsk-medoidsMétodos de acesso métricoMetric access methodIn the dawn of computing, data processing techniques and tools were developed to deal with scalar data types. However, with technological advancements, there has been a significant growth in the amount and complexity of data. This surge has necessitated the development of techniques that can efficiently handle complex data types. Here we call as complex the data types that do not have a predefined way on how they must be compared, as is the case of comparisons involving similarity. Among the existing strategies in the literature, clustering techniques stand out as a promising approach for identifying patterns in data by forming groups. Within the realm of clustering algorithms, k-medoids-based methods have emerged as one of the most widely used approaches. However, these methods exhibit high computational costs when applied to large datasets. Despite numerous efforts in the literature to optimize k-medoids algorithms, they still face limitations when dealing with large datasets, in particular when these data are complex. This is primarily because they need to compute and store a distance matrix in memory, rendering them impractical for handling voluminous datasets. In this masters research, the KluSIM algorithm is proposed, a novel approach to enhance the computational efficiency of the swap step in k-medoids algorithms. KluSIM employs Access Methods to prune the search space, significantly accelerating the swap step. Additionally, KluSIM eliminates the need to maintain a distance matrix in main memory, effectively overcoming the memory limitations of existing methodologies. Overall, the experiments conducted demonstrate that KluSIM effectively optimizes the swap step k-medoids algorithms, substantially reducing the number of distance calculations required during the clustering process. Furthermore, KluSIM can be applied to big data tasks as it proves to be scalable and effective for clustering in the tested datasets.Inicialmente, as técnicas e ferramentas de processamento de dados foram desenvolvidas para lidar com tipos de dados escalares. Contudo, com o avanço tecnológico, houve um crescimento significativo na quantidade e complexidade dos dados. Assim, tornou-se necessário o desenvolvimento de técnicas que permitam a manipulação eficiente de tipos de dados complexos. Consideramos aqui como complexos os dados que não contam com uma definição predefinida sobre como devem ser comparados, como é o caso de comparações envolvendo similaridade. Entre as estratégias existentes na literatura, destaca-se a detecção de agrupamento que busca encontrar padrões nos dados através da criação de grupos. Na literatura, os algoritmos de agrupamento baseados em k-medoids destacam-se como uma das abordagens mais utilizadas. Entretanto, esses métodos possuem alto custo computacional quando aplicados em grandes conjuntos de dados. Embora muitos trabalhos na literatura busquem otimizar os algoritmos k-medoids, eles ainda enfrentam limitações quando aplicados a grandes conjuntos de dados, especialmente quando esses dados são complexos, uma vez que exigem o cálculo e armazenamento de uma matriz de distância em memória. Isso os torna inviáveis para lidar com tais conjuntos de dados. Nesta dissertação de mestrado, foi proposto um novo algoritmo que melhora a eficiência computacional da etapa de dos algoritmos k-medoids. O KluSIM utiliza Métodos de Acesso para podar o espaço de busca, acelerando a etapa de troca. Além disso, o KluSIM elimina a necessidade de manter uma matriz de distância em memória principal, superando efetivamente as limitações de memória encontradas nas metodologias existentes. No geral, os experimentos realizados mostram que o KluSIM contribui de maneira efetiva na otimização da etapa de troca dos algoritmos k-medoids, reduzindo significativamente o número de cálculos de distância necessárias durante o processo de agrupamento. O KluSIM pode ser aplicado em tarefas de big data, uma vez que mostrou-se ser escalável e eficaz para o agrupamento nos conjunto de dados nos testes executados.Biblioteca Digitais de Teses e Dissertações da USPTraina Junior, CaetanoTeixeira, Larissa Roberta2024-07-03info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/55/55134/tde-27082024-144742/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2024-08-27T17:55:02Zoai:teses.usp.br:tde-27082024-144742Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212024-08-27T17:55:02Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false |
dc.title.none.fl_str_mv |
Improving the efficiency of k-medoids algorithms using metric access methods Melhorando a eficiência dos algoritmos k-medoids usando métodos de acesso métrico |
title |
Improving the efficiency of k-medoids algorithms using metric access methods |
spellingShingle |
Improving the efficiency of k-medoids algorithms using metric access methods Teixeira, Larissa Roberta Agrupamento Clustering Dados dimensionais Dimensional data Indexação Indexing k-medoids k-medoids Métodos de acesso métrico Metric access method |
title_short |
Improving the efficiency of k-medoids algorithms using metric access methods |
title_full |
Improving the efficiency of k-medoids algorithms using metric access methods |
title_fullStr |
Improving the efficiency of k-medoids algorithms using metric access methods |
title_full_unstemmed |
Improving the efficiency of k-medoids algorithms using metric access methods |
title_sort |
Improving the efficiency of k-medoids algorithms using metric access methods |
author |
Teixeira, Larissa Roberta |
author_facet |
Teixeira, Larissa Roberta |
author_role |
author |
dc.contributor.none.fl_str_mv |
Traina Junior, Caetano |
dc.contributor.author.fl_str_mv |
Teixeira, Larissa Roberta |
dc.subject.por.fl_str_mv |
Agrupamento Clustering Dados dimensionais Dimensional data Indexação Indexing k-medoids k-medoids Métodos de acesso métrico Metric access method |
topic |
Agrupamento Clustering Dados dimensionais Dimensional data Indexação Indexing k-medoids k-medoids Métodos de acesso métrico Metric access method |
description |
In the dawn of computing, data processing techniques and tools were developed to deal with scalar data types. However, with technological advancements, there has been a significant growth in the amount and complexity of data. This surge has necessitated the development of techniques that can efficiently handle complex data types. Here we call as complex the data types that do not have a predefined way on how they must be compared, as is the case of comparisons involving similarity. Among the existing strategies in the literature, clustering techniques stand out as a promising approach for identifying patterns in data by forming groups. Within the realm of clustering algorithms, k-medoids-based methods have emerged as one of the most widely used approaches. However, these methods exhibit high computational costs when applied to large datasets. Despite numerous efforts in the literature to optimize k-medoids algorithms, they still face limitations when dealing with large datasets, in particular when these data are complex. This is primarily because they need to compute and store a distance matrix in memory, rendering them impractical for handling voluminous datasets. In this masters research, the KluSIM algorithm is proposed, a novel approach to enhance the computational efficiency of the swap step in k-medoids algorithms. KluSIM employs Access Methods to prune the search space, significantly accelerating the swap step. Additionally, KluSIM eliminates the need to maintain a distance matrix in main memory, effectively overcoming the memory limitations of existing methodologies. Overall, the experiments conducted demonstrate that KluSIM effectively optimizes the swap step k-medoids algorithms, substantially reducing the number of distance calculations required during the clustering process. Furthermore, KluSIM can be applied to big data tasks as it proves to be scalable and effective for clustering in the tested datasets. |
publishDate |
2024 |
dc.date.none.fl_str_mv |
2024-07-03 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://www.teses.usp.br/teses/disponiveis/55/55134/tde-27082024-144742/ |
url |
https://www.teses.usp.br/teses/disponiveis/55/55134/tde-27082024-144742/ |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
|
dc.rights.driver.fl_str_mv |
Liberar o conteúdo para acesso público. info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Liberar o conteúdo para acesso público. |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.coverage.none.fl_str_mv |
|
dc.publisher.none.fl_str_mv |
Biblioteca Digitais de Teses e Dissertações da USP |
publisher.none.fl_str_mv |
Biblioteca Digitais de Teses e Dissertações da USP |
dc.source.none.fl_str_mv |
reponame:Biblioteca Digital de Teses e Dissertações da USP instname:Universidade de São Paulo (USP) instacron:USP |
instname_str |
Universidade de São Paulo (USP) |
instacron_str |
USP |
institution |
USP |
reponame_str |
Biblioteca Digital de Teses e Dissertações da USP |
collection |
Biblioteca Digital de Teses e Dissertações da USP |
repository.name.fl_str_mv |
Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP) |
repository.mail.fl_str_mv |
virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br |
_version_ |
1809091130539638784 |