Deobfuscating leetspeak with deep learning to improve spam filtering

Mendizabal, I. V.; Vidriales, X.; Basto-Fernandes, V.; Ezpeleta, E.; Méndez, J. R.; Zurutuza, U.

Deobfuscating leetspeak with deep learning to improve spam filtering

Detalhes bibliográficos
Autor(a) principal:	Mendizabal, I. V.
Data de Publicação:	2023
Outros Autores:	Vidriales, X., Basto-Fernandes, V., Ezpeleta, E., Méndez, J. R., Zurutuza, U.
Tipo de documento:	Artigo
Idioma:	eng
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	http://hdl.handle.net/10071/30730
Resumo:	The evolution of anti-spam filters has forced spammers to make greater efforts to bypass filters in order to distribute content over networks. The distribution of content encoded in images or the use of Leetspeak are concrete and clear examples of techniques currently used to bypass filters. Despite the importance of dealing with these problems, the number of studies to solve them is quite small, and the reported performance is very limited. This study reviews the work done so far (very rudimentary) for Leetspeak deobfuscation and proposes a new technique based on using neural networks for decoding purposes. In addition, we distribute an image database specifically created for training Leetspeak decoding models. We have also created and made available four different corpora to analyse the performance of Leetspeak decoding schemes. Using these corpora, we have experimentally evaluated our neural network approach for decoding Leetspeak. The results obtained have shown the usefulness of the proposed model for addressing the deobfuscation of Leetspeak character sequences. © 2023, Universidad Internacional de la Rioja.

Metadados do item

id	RCAP_8307303e45fd87644ce6272aae7b396c
oai_identifier_str	oai:repositorio.iscte-iul.pt:10071/30730
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	Deobfuscating leetspeak with deep learning to improve spam filteringConvolutional neural networksDeep learningLeetspeakSpam filteringText deobfuscationThe evolution of anti-spam filters has forced spammers to make greater efforts to bypass filters in order to distribute content over networks. The distribution of content encoded in images or the use of Leetspeak are concrete and clear examples of techniques currently used to bypass filters. Despite the importance of dealing with these problems, the number of studies to solve them is quite small, and the reported performance is very limited. This study reviews the work done so far (very rudimentary) for Leetspeak deobfuscation and proposes a new technique based on using neural networks for decoding purposes. In addition, we distribute an image database specifically created for training Leetspeak decoding models. We have also created and made available four different corpora to analyse the performance of Leetspeak decoding schemes. Using these corpora, we have experimentally evaluated our neural network approach for decoding Leetspeak. The results obtained have shown the usefulness of the proposed model for addressing the deobfuscation of Leetspeak character sequences. © 2023, Universidad Internacional de la Rioja.Universidad Internacional de La Rioja2024-01-31T12:50:56Z2023-01-01T00:00:00Z20232024-01-31T12:49:51Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10071/30730eng1989-166010.9781/ijimai.2023.07.003Mendizabal, I. V.Vidriales, X.Basto-Fernandes, V.Ezpeleta, E.Méndez, J. R.Zurutuza, U.info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-02-04T01:20:06Zoai:repositorio.iscte-iul.pt:10071/30730Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T02:08:03.330337Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	Deobfuscating leetspeak with deep learning to improve spam filtering
title	Deobfuscating leetspeak with deep learning to improve spam filtering
spellingShingle	Deobfuscating leetspeak with deep learning to improve spam filtering Mendizabal, I. V. Convolutional neural networks Deep learning Leetspeak Spam filtering Text deobfuscation
title_short	Deobfuscating leetspeak with deep learning to improve spam filtering
title_full	Deobfuscating leetspeak with deep learning to improve spam filtering
title_fullStr	Deobfuscating leetspeak with deep learning to improve spam filtering
title_full_unstemmed	Deobfuscating leetspeak with deep learning to improve spam filtering
title_sort	Deobfuscating leetspeak with deep learning to improve spam filtering
author	Mendizabal, I. V.
author_facet	Mendizabal, I. V. Vidriales, X. Basto-Fernandes, V. Ezpeleta, E. Méndez, J. R. Zurutuza, U.
author_role	author
author2	Vidriales, X. Basto-Fernandes, V. Ezpeleta, E. Méndez, J. R. Zurutuza, U.
author2_role	author author author author author
dc.contributor.author.fl_str_mv	Mendizabal, I. V. Vidriales, X. Basto-Fernandes, V. Ezpeleta, E. Méndez, J. R. Zurutuza, U.
dc.subject.por.fl_str_mv	Convolutional neural networks Deep learning Leetspeak Spam filtering Text deobfuscation
topic	Convolutional neural networks Deep learning Leetspeak Spam filtering Text deobfuscation
description	The evolution of anti-spam filters has forced spammers to make greater efforts to bypass filters in order to distribute content over networks. The distribution of content encoded in images or the use of Leetspeak are concrete and clear examples of techniques currently used to bypass filters. Despite the importance of dealing with these problems, the number of studies to solve them is quite small, and the reported performance is very limited. This study reviews the work done so far (very rudimentary) for Leetspeak deobfuscation and proposes a new technique based on using neural networks for decoding purposes. In addition, we distribute an image database specifically created for training Leetspeak decoding models. We have also created and made available four different corpora to analyse the performance of Leetspeak decoding schemes. Using these corpora, we have experimentally evaluated our neural network approach for decoding Leetspeak. The results obtained have shown the usefulness of the proposed model for addressing the deobfuscation of Leetspeak character sequences. © 2023, Universidad Internacional de la Rioja.
publishDate	2023
dc.date.none.fl_str_mv	2023-01-01T00:00:00Z 2023 2024-01-31T12:50:56Z 2024-01-31T12:49:51Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10071/30730
url	http://hdl.handle.net/10071/30730
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	1989-1660 10.9781/ijimai.2023.07.003
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Universidad Internacional de La Rioja
publisher.none.fl_str_mv	Universidad Internacional de La Rioja
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799137169486381056

Deobfuscating leetspeak with deep learning to improve spam filtering

Registros relacionados