Dealing with repeated objects in SNNagg

Detalhes bibliográficos
Autor(a) principal: Galvão, João Rui Magalhães Velho da Cunha
Data de Publicação: 2016
Outros Autores: Santos, Maribel Yasmina, Pires, João Moura, Costa, Carlos
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/1822/42342
Resumo: Due to the constant technological advances and massive use of electronic devices, the amount of data generated has increased at a very high rate, leading to the urgent need to process larger amounts of data in less time. In order to be able to handle these large amounts of data, several techniques and algorithms have been developed in the area of knowledge discovery in databases, which process consists of several stages, including data mining that analyze vast amounts of data, identifying patterns, models or trends. Among the several data mining techniques, this work is focused in clustering spatial data with a density-based approach that uses the Shared Nearest Neighbor algorithm (SNN). SNN has shown several advantages when analyzing this type of data, identifying clusters of different sizes, shapes, and densities, and also dealing with noise. This paper presents and evaluates a new extension of SNN that is able to deal with repeated objects, creating aggregates that reduce the processing time required to cluster a given dataset, as repeated objects are excluded from the most time demanding step, which is associated with the identification of the k-nearest neighbors of a point. The proposed approach, SNNagg, was evaluated and the obtained results show that the processing time is reduced without compromising the quality of the obtained clusters.
id RCAP_a92abd5e913e2d9fd2cade0f357df8c3
oai_identifier_str oai:repositorium.sdum.uminho.pt:1822/42342
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str
spelling Dealing with repeated objects in SNNaggSpatial DataSpatio-Temporal DataClusteringSNNDensity-based ClusteringEngenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e InformáticaDue to the constant technological advances and massive use of electronic devices, the amount of data generated has increased at a very high rate, leading to the urgent need to process larger amounts of data in less time. In order to be able to handle these large amounts of data, several techniques and algorithms have been developed in the area of knowledge discovery in databases, which process consists of several stages, including data mining that analyze vast amounts of data, identifying patterns, models or trends. Among the several data mining techniques, this work is focused in clustering spatial data with a density-based approach that uses the Shared Nearest Neighbor algorithm (SNN). SNN has shown several advantages when analyzing this type of data, identifying clusters of different sizes, shapes, and densities, and also dealing with noise. This paper presents and evaluates a new extension of SNN that is able to deal with repeated objects, creating aggregates that reduce the processing time required to cluster a given dataset, as repeated objects are excluded from the most time demanding step, which is associated with the identification of the k-nearest neighbors of a point. The proposed approach, SNNagg, was evaluated and the obtained results show that the processing time is reduced without compromising the quality of the obtained clusters.This work has been supported by FCT, Fundação para a Ciência e Tecnologia, within the Project Scope UID/CEC/00319/2013.IAENGUniversidade do MinhoGalvão, João Rui Magalhães Velho da CunhaSantos, Maribel YasminaPires, João MouraCosta, Carlos2016-02-162016-02-16T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/1822/42342engJoao Galvão, Maribel Yasmina Santos, Joao Moura Pires, and Carlos Costa, "Dealing with Repeated Objects in SNNagg", IAENG International Journal of Computer Science, vol. 43, no. 1, pp115-125, 2016, ISSN: 1819656X.1819656Xinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-07-21T12:13:43ZPortal AgregadorONG
dc.title.none.fl_str_mv Dealing with repeated objects in SNNagg
title Dealing with repeated objects in SNNagg
spellingShingle Dealing with repeated objects in SNNagg
Galvão, João Rui Magalhães Velho da Cunha
Spatial Data
Spatio-Temporal Data
Clustering
SNN
Density-based Clustering
Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
title_short Dealing with repeated objects in SNNagg
title_full Dealing with repeated objects in SNNagg
title_fullStr Dealing with repeated objects in SNNagg
title_full_unstemmed Dealing with repeated objects in SNNagg
title_sort Dealing with repeated objects in SNNagg
author Galvão, João Rui Magalhães Velho da Cunha
author_facet Galvão, João Rui Magalhães Velho da Cunha
Santos, Maribel Yasmina
Pires, João Moura
Costa, Carlos
author_role author
author2 Santos, Maribel Yasmina
Pires, João Moura
Costa, Carlos
author2_role author
author
author
dc.contributor.none.fl_str_mv Universidade do Minho
dc.contributor.author.fl_str_mv Galvão, João Rui Magalhães Velho da Cunha
Santos, Maribel Yasmina
Pires, João Moura
Costa, Carlos
dc.subject.por.fl_str_mv Spatial Data
Spatio-Temporal Data
Clustering
SNN
Density-based Clustering
Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
topic Spatial Data
Spatio-Temporal Data
Clustering
SNN
Density-based Clustering
Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
description Due to the constant technological advances and massive use of electronic devices, the amount of data generated has increased at a very high rate, leading to the urgent need to process larger amounts of data in less time. In order to be able to handle these large amounts of data, several techniques and algorithms have been developed in the area of knowledge discovery in databases, which process consists of several stages, including data mining that analyze vast amounts of data, identifying patterns, models or trends. Among the several data mining techniques, this work is focused in clustering spatial data with a density-based approach that uses the Shared Nearest Neighbor algorithm (SNN). SNN has shown several advantages when analyzing this type of data, identifying clusters of different sizes, shapes, and densities, and also dealing with noise. This paper presents and evaluates a new extension of SNN that is able to deal with repeated objects, creating aggregates that reduce the processing time required to cluster a given dataset, as repeated objects are excluded from the most time demanding step, which is associated with the identification of the k-nearest neighbors of a point. The proposed approach, SNNagg, was evaluated and the obtained results show that the processing time is reduced without compromising the quality of the obtained clusters.
publishDate 2016
dc.date.none.fl_str_mv 2016-02-16
2016-02-16T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/1822/42342
url http://hdl.handle.net/1822/42342
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv Joao Galvão, Maribel Yasmina Santos, Joao Moura Pires, and Carlos Costa, "Dealing with Repeated Objects in SNNagg", IAENG International Journal of Computer Science, vol. 43, no. 1, pp115-125, 2016, ISSN: 1819656X.
1819656X
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv IAENG
publisher.none.fl_str_mv IAENG
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv
repository.mail.fl_str_mv
_version_ 1777303709600972800