Fast anomaly detection with locality-sensitive hashing and hyperparameter autotuning

Detalhes bibliográficos
Autor(a) principal: Meira, Jorge
Data de Publicação: 2022
Outros Autores: Eiras-Franco, Carlos, Bolón-Canedo, Verónica, Marreiros, Goreti, Alonso-Betanzos, Amparo
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10400.22/22041
Resumo: This paper presents LSHAD, an anomaly detection (AD) method based on Locality Sensitive Hashing (LSH), capable of dealing with large-scale datasets. The resulting algorithm is highly parallelizable and its implementation in Apache Spark further increases its ability to handle very large datasets. Moreover, the algorithm incorporates an automatic hyperparameter tuning mechanism so that users do not have to implement costly manual tuning. Our LSHAD method is novel as both hyperparameter automation and distributed properties are not usual in AD techniques. Our results for experiments with LSHAD across a variety of datasets point to state-of-the-art AD performance while handling much larger datasets than state-of-the-art alternatives. In addition, evaluation results for the tradeoff between AD performance and scalability show that our method offers significant advantages over competing methods.
id RCAP_d283091dfc19188823fa18214d366a51
oai_identifier_str oai:recipp.ipp.pt:10400.22/22041
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Fast anomaly detection with locality-sensitive hashing and hyperparameter autotuningAnomaly detectionUnsupervised learningAutoMLScalabilityBig dataThis paper presents LSHAD, an anomaly detection (AD) method based on Locality Sensitive Hashing (LSH), capable of dealing with large-scale datasets. The resulting algorithm is highly parallelizable and its implementation in Apache Spark further increases its ability to handle very large datasets. Moreover, the algorithm incorporates an automatic hyperparameter tuning mechanism so that users do not have to implement costly manual tuning. Our LSHAD method is novel as both hyperparameter automation and distributed properties are not usual in AD techniques. Our results for experiments with LSHAD across a variety of datasets point to state-of-the-art AD performance while handling much larger datasets than state-of-the-art alternatives. In addition, evaluation results for the tradeoff between AD performance and scalability show that our method offers significant advantages over competing methods.This research has been financially supported in part by the Spanish Ministerio de Economía y Competitividad (project PID-2019-109238GB-C22) and by the Xunta de Galicia (grants ED431C 2018/34 and ED431G 2019/01) through European Union ERDF funds. CITIC, as a research center accredited by the Galician University System, is funded by the Consellería de Cultura, Educación e Universidades of the Xunta de Galicia, supported 80% through ERDF Funds (ERDF Operational Programme Galicia 2014–2020) and 20% by the Secretaría Xeral de Universidades (Grant ED431G 2019/01).This work was also supported by National Funds through the Portuguese FCT - Fundação para a Ciência e a Tecnologia (projects UIDB/00760/2020 and UIDP/00760/2020).ElsevierRepositório Científico do Instituto Politécnico do PortoMeira, JorgeEiras-Franco, CarlosBolón-Canedo, VerónicaMarreiros, GoretiAlonso-Betanzos, Amparo2023-01-31T15:46:52Z20222022-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10400.22/22041eng10.1016/j.ins.2022.06.035info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-03-13T13:18:24Zoai:recipp.ipp.pt:10400.22/22041Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T17:42:07.112878Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Fast anomaly detection with locality-sensitive hashing and hyperparameter autotuning
title Fast anomaly detection with locality-sensitive hashing and hyperparameter autotuning
spellingShingle Fast anomaly detection with locality-sensitive hashing and hyperparameter autotuning
Meira, Jorge
Anomaly detection
Unsupervised learning
AutoML
Scalability
Big data
title_short Fast anomaly detection with locality-sensitive hashing and hyperparameter autotuning
title_full Fast anomaly detection with locality-sensitive hashing and hyperparameter autotuning
title_fullStr Fast anomaly detection with locality-sensitive hashing and hyperparameter autotuning
title_full_unstemmed Fast anomaly detection with locality-sensitive hashing and hyperparameter autotuning
title_sort Fast anomaly detection with locality-sensitive hashing and hyperparameter autotuning
author Meira, Jorge
author_facet Meira, Jorge
Eiras-Franco, Carlos
Bolón-Canedo, Verónica
Marreiros, Goreti
Alonso-Betanzos, Amparo
author_role author
author2 Eiras-Franco, Carlos
Bolón-Canedo, Verónica
Marreiros, Goreti
Alonso-Betanzos, Amparo
author2_role author
author
author
author
dc.contributor.none.fl_str_mv Repositório Científico do Instituto Politécnico do Porto
dc.contributor.author.fl_str_mv Meira, Jorge
Eiras-Franco, Carlos
Bolón-Canedo, Verónica
Marreiros, Goreti
Alonso-Betanzos, Amparo
dc.subject.por.fl_str_mv Anomaly detection
Unsupervised learning
AutoML
Scalability
Big data
topic Anomaly detection
Unsupervised learning
AutoML
Scalability
Big data
description This paper presents LSHAD, an anomaly detection (AD) method based on Locality Sensitive Hashing (LSH), capable of dealing with large-scale datasets. The resulting algorithm is highly parallelizable and its implementation in Apache Spark further increases its ability to handle very large datasets. Moreover, the algorithm incorporates an automatic hyperparameter tuning mechanism so that users do not have to implement costly manual tuning. Our LSHAD method is novel as both hyperparameter automation and distributed properties are not usual in AD techniques. Our results for experiments with LSHAD across a variety of datasets point to state-of-the-art AD performance while handling much larger datasets than state-of-the-art alternatives. In addition, evaluation results for the tradeoff between AD performance and scalability show that our method offers significant advantages over competing methods.
publishDate 2022
dc.date.none.fl_str_mv 2022
2022-01-01T00:00:00Z
2023-01-31T15:46:52Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10400.22/22041
url http://hdl.handle.net/10400.22/22041
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 10.1016/j.ins.2022.06.035
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Elsevier
publisher.none.fl_str_mv Elsevier
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799131507237847040