Deobfuscating leetspeak with deep learning to improve spam filtering
Autor(a) principal: | |
---|---|
Data de Publicação: | 2023 |
Outros Autores: | , , , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10071/30730 |
Resumo: | The evolution of anti-spam filters has forced spammers to make greater efforts to bypass filters in order to distribute content over networks. The distribution of content encoded in images or the use of Leetspeak are concrete and clear examples of techniques currently used to bypass filters. Despite the importance of dealing with these problems, the number of studies to solve them is quite small, and the reported performance is very limited. This study reviews the work done so far (very rudimentary) for Leetspeak deobfuscation and proposes a new technique based on using neural networks for decoding purposes. In addition, we distribute an image database specifically created for training Leetspeak decoding models. We have also created and made available four different corpora to analyse the performance of Leetspeak decoding schemes. Using these corpora, we have experimentally evaluated our neural network approach for decoding Leetspeak. The results obtained have shown the usefulness of the proposed model for addressing the deobfuscation of Leetspeak character sequences. © 2023, Universidad Internacional de la Rioja. |
id |
RCAP_8307303e45fd87644ce6272aae7b396c |
---|---|
oai_identifier_str |
oai:repositorio.iscte-iul.pt:10071/30730 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Deobfuscating leetspeak with deep learning to improve spam filteringConvolutional neural networksDeep learningLeetspeakSpam filteringText deobfuscationThe evolution of anti-spam filters has forced spammers to make greater efforts to bypass filters in order to distribute content over networks. The distribution of content encoded in images or the use of Leetspeak are concrete and clear examples of techniques currently used to bypass filters. Despite the importance of dealing with these problems, the number of studies to solve them is quite small, and the reported performance is very limited. This study reviews the work done so far (very rudimentary) for Leetspeak deobfuscation and proposes a new technique based on using neural networks for decoding purposes. In addition, we distribute an image database specifically created for training Leetspeak decoding models. We have also created and made available four different corpora to analyse the performance of Leetspeak decoding schemes. Using these corpora, we have experimentally evaluated our neural network approach for decoding Leetspeak. The results obtained have shown the usefulness of the proposed model for addressing the deobfuscation of Leetspeak character sequences. © 2023, Universidad Internacional de la Rioja.Universidad Internacional de La Rioja2024-01-31T12:50:56Z2023-01-01T00:00:00Z20232024-01-31T12:49:51Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10071/30730eng1989-166010.9781/ijimai.2023.07.003Mendizabal, I. V.Vidriales, X.Basto-Fernandes, V.Ezpeleta, E.Méndez, J. R.Zurutuza, U.info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-02-04T01:20:06Zoai:repositorio.iscte-iul.pt:10071/30730Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T02:08:03.330337Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Deobfuscating leetspeak with deep learning to improve spam filtering |
title |
Deobfuscating leetspeak with deep learning to improve spam filtering |
spellingShingle |
Deobfuscating leetspeak with deep learning to improve spam filtering Mendizabal, I. V. Convolutional neural networks Deep learning Leetspeak Spam filtering Text deobfuscation |
title_short |
Deobfuscating leetspeak with deep learning to improve spam filtering |
title_full |
Deobfuscating leetspeak with deep learning to improve spam filtering |
title_fullStr |
Deobfuscating leetspeak with deep learning to improve spam filtering |
title_full_unstemmed |
Deobfuscating leetspeak with deep learning to improve spam filtering |
title_sort |
Deobfuscating leetspeak with deep learning to improve spam filtering |
author |
Mendizabal, I. V. |
author_facet |
Mendizabal, I. V. Vidriales, X. Basto-Fernandes, V. Ezpeleta, E. Méndez, J. R. Zurutuza, U. |
author_role |
author |
author2 |
Vidriales, X. Basto-Fernandes, V. Ezpeleta, E. Méndez, J. R. Zurutuza, U. |
author2_role |
author author author author author |
dc.contributor.author.fl_str_mv |
Mendizabal, I. V. Vidriales, X. Basto-Fernandes, V. Ezpeleta, E. Méndez, J. R. Zurutuza, U. |
dc.subject.por.fl_str_mv |
Convolutional neural networks Deep learning Leetspeak Spam filtering Text deobfuscation |
topic |
Convolutional neural networks Deep learning Leetspeak Spam filtering Text deobfuscation |
description |
The evolution of anti-spam filters has forced spammers to make greater efforts to bypass filters in order to distribute content over networks. The distribution of content encoded in images or the use of Leetspeak are concrete and clear examples of techniques currently used to bypass filters. Despite the importance of dealing with these problems, the number of studies to solve them is quite small, and the reported performance is very limited. This study reviews the work done so far (very rudimentary) for Leetspeak deobfuscation and proposes a new technique based on using neural networks for decoding purposes. In addition, we distribute an image database specifically created for training Leetspeak decoding models. We have also created and made available four different corpora to analyse the performance of Leetspeak decoding schemes. Using these corpora, we have experimentally evaluated our neural network approach for decoding Leetspeak. The results obtained have shown the usefulness of the proposed model for addressing the deobfuscation of Leetspeak character sequences. © 2023, Universidad Internacional de la Rioja. |
publishDate |
2023 |
dc.date.none.fl_str_mv |
2023-01-01T00:00:00Z 2023 2024-01-31T12:50:56Z 2024-01-31T12:49:51Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10071/30730 |
url |
http://hdl.handle.net/10071/30730 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
1989-1660 10.9781/ijimai.2023.07.003 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Universidad Internacional de La Rioja |
publisher.none.fl_str_mv |
Universidad Internacional de La Rioja |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799137169486381056 |