Fast anomaly detection with locality-sensitive hashing and hyperparameter autotuning
Autor(a) principal: | |
---|---|
Data de Publicação: | 2022 |
Outros Autores: | , , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10400.22/22041 |
Resumo: | This paper presents LSHAD, an anomaly detection (AD) method based on Locality Sensitive Hashing (LSH), capable of dealing with large-scale datasets. The resulting algorithm is highly parallelizable and its implementation in Apache Spark further increases its ability to handle very large datasets. Moreover, the algorithm incorporates an automatic hyperparameter tuning mechanism so that users do not have to implement costly manual tuning. Our LSHAD method is novel as both hyperparameter automation and distributed properties are not usual in AD techniques. Our results for experiments with LSHAD across a variety of datasets point to state-of-the-art AD performance while handling much larger datasets than state-of-the-art alternatives. In addition, evaluation results for the tradeoff between AD performance and scalability show that our method offers significant advantages over competing methods. |
id |
RCAP_d283091dfc19188823fa18214d366a51 |
---|---|
oai_identifier_str |
oai:recipp.ipp.pt:10400.22/22041 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Fast anomaly detection with locality-sensitive hashing and hyperparameter autotuningAnomaly detectionUnsupervised learningAutoMLScalabilityBig dataThis paper presents LSHAD, an anomaly detection (AD) method based on Locality Sensitive Hashing (LSH), capable of dealing with large-scale datasets. The resulting algorithm is highly parallelizable and its implementation in Apache Spark further increases its ability to handle very large datasets. Moreover, the algorithm incorporates an automatic hyperparameter tuning mechanism so that users do not have to implement costly manual tuning. Our LSHAD method is novel as both hyperparameter automation and distributed properties are not usual in AD techniques. Our results for experiments with LSHAD across a variety of datasets point to state-of-the-art AD performance while handling much larger datasets than state-of-the-art alternatives. In addition, evaluation results for the tradeoff between AD performance and scalability show that our method offers significant advantages over competing methods.This research has been financially supported in part by the Spanish Ministerio de Economía y Competitividad (project PID-2019-109238GB-C22) and by the Xunta de Galicia (grants ED431C 2018/34 and ED431G 2019/01) through European Union ERDF funds. CITIC, as a research center accredited by the Galician University System, is funded by the Consellería de Cultura, Educación e Universidades of the Xunta de Galicia, supported 80% through ERDF Funds (ERDF Operational Programme Galicia 2014–2020) and 20% by the Secretaría Xeral de Universidades (Grant ED431G 2019/01).This work was also supported by National Funds through the Portuguese FCT - Fundação para a Ciência e a Tecnologia (projects UIDB/00760/2020 and UIDP/00760/2020).ElsevierRepositório Científico do Instituto Politécnico do PortoMeira, JorgeEiras-Franco, CarlosBolón-Canedo, VerónicaMarreiros, GoretiAlonso-Betanzos, Amparo2023-01-31T15:46:52Z20222022-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10400.22/22041eng10.1016/j.ins.2022.06.035info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-03-13T13:18:24Zoai:recipp.ipp.pt:10400.22/22041Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T17:42:07.112878Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Fast anomaly detection with locality-sensitive hashing and hyperparameter autotuning |
title |
Fast anomaly detection with locality-sensitive hashing and hyperparameter autotuning |
spellingShingle |
Fast anomaly detection with locality-sensitive hashing and hyperparameter autotuning Meira, Jorge Anomaly detection Unsupervised learning AutoML Scalability Big data |
title_short |
Fast anomaly detection with locality-sensitive hashing and hyperparameter autotuning |
title_full |
Fast anomaly detection with locality-sensitive hashing and hyperparameter autotuning |
title_fullStr |
Fast anomaly detection with locality-sensitive hashing and hyperparameter autotuning |
title_full_unstemmed |
Fast anomaly detection with locality-sensitive hashing and hyperparameter autotuning |
title_sort |
Fast anomaly detection with locality-sensitive hashing and hyperparameter autotuning |
author |
Meira, Jorge |
author_facet |
Meira, Jorge Eiras-Franco, Carlos Bolón-Canedo, Verónica Marreiros, Goreti Alonso-Betanzos, Amparo |
author_role |
author |
author2 |
Eiras-Franco, Carlos Bolón-Canedo, Verónica Marreiros, Goreti Alonso-Betanzos, Amparo |
author2_role |
author author author author |
dc.contributor.none.fl_str_mv |
Repositório Científico do Instituto Politécnico do Porto |
dc.contributor.author.fl_str_mv |
Meira, Jorge Eiras-Franco, Carlos Bolón-Canedo, Verónica Marreiros, Goreti Alonso-Betanzos, Amparo |
dc.subject.por.fl_str_mv |
Anomaly detection Unsupervised learning AutoML Scalability Big data |
topic |
Anomaly detection Unsupervised learning AutoML Scalability Big data |
description |
This paper presents LSHAD, an anomaly detection (AD) method based on Locality Sensitive Hashing (LSH), capable of dealing with large-scale datasets. The resulting algorithm is highly parallelizable and its implementation in Apache Spark further increases its ability to handle very large datasets. Moreover, the algorithm incorporates an automatic hyperparameter tuning mechanism so that users do not have to implement costly manual tuning. Our LSHAD method is novel as both hyperparameter automation and distributed properties are not usual in AD techniques. Our results for experiments with LSHAD across a variety of datasets point to state-of-the-art AD performance while handling much larger datasets than state-of-the-art alternatives. In addition, evaluation results for the tradeoff between AD performance and scalability show that our method offers significant advantages over competing methods. |
publishDate |
2022 |
dc.date.none.fl_str_mv |
2022 2022-01-01T00:00:00Z 2023-01-31T15:46:52Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10400.22/22041 |
url |
http://hdl.handle.net/10400.22/22041 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
10.1016/j.ins.2022.06.035 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Elsevier |
publisher.none.fl_str_mv |
Elsevier |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799131507237847040 |