Detection of Newly Registered Malicious Domains through Passive DNS

Detalhes bibliográficos
Autor(a) principal: Silveira, Marcos Rogério [UNESP]
Data de Publicação: 2021
Outros Autores: Marcos Da Silva, Leandro [UNESP], Cansian, Adriano Mauro [UNESP], Kobayashi, Hugo Koji
Tipo de documento: Artigo de conferência
Idioma: eng
Título da fonte: Repositório Institucional da UNESP
Texto Completo: http://dx.doi.org/10.1109/BigData52589.2021.9671348
http://hdl.handle.net/11449/234203
Resumo: Due to the importance of DNS for the good functioning of the Internet, malicious users register domains for malicious purposes, such as the spreading of malware and the practice of phishing. In this work, an approach capable of detecting malicious domains just 72 hours after the first DNS query was developed. The data source used was the passive DNS collected from an authoritative TLD server with the enrichment of data later, which generated columns encompassing data related to geolocation, which resulted in 20 features. The model used LightGBM as a machine learning algorithm, and oversampling and undersampling techniques for data balancing, such as Cluster Centroids and K-Means SMOTE, proving efficiency with an average AUC of 0.9763 and F1-score of 0.905, in addition to the TPR of 0.8656 in the validation of the model.
id UNSP_a6287bc672fa509f0bfbc3167047b4d2
oai_identifier_str oai:repositorio.unesp.br:11449/234203
network_acronym_str UNSP
network_name_str Repositório Institucional da UNESP
repository_id_str 2946
spelling Detection of Newly Registered Malicious Domains through Passive DNSData ImbalancedDomain Name SystemMachine LearningMalicious DomainsPassive DNSDue to the importance of DNS for the good functioning of the Internet, malicious users register domains for malicious purposes, such as the spreading of malware and the practice of phishing. In this work, an approach capable of detecting malicious domains just 72 hours after the first DNS query was developed. The data source used was the passive DNS collected from an authoritative TLD server with the enrichment of data later, which generated columns encompassing data related to geolocation, which resulted in 20 features. The model used LightGBM as a machine learning algorithm, and oversampling and undersampling techniques for data balancing, such as Cluster Centroids and K-Means SMOTE, proving efficiency with an average AUC of 0.9763 and F1-score of 0.905, in addition to the TPR of 0.8656 in the validation of the model.São Paulo State University (UNESP)Brazilian Network Information Center (NIC.br)São Paulo State University (UNESP)Universidade Estadual Paulista (UNESP)Brazilian Network Information Center (NIC.br)Silveira, Marcos Rogério [UNESP]Marcos Da Silva, Leandro [UNESP]Cansian, Adriano Mauro [UNESP]Kobayashi, Hugo Koji2022-05-01T13:57:35Z2022-05-01T13:57:35Z2021-01-01info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/conferenceObject3360-3369http://dx.doi.org/10.1109/BigData52589.2021.9671348Proceedings - 2021 IEEE International Conference on Big Data, Big Data 2021, p. 3360-3369.http://hdl.handle.net/11449/23420310.1109/BigData52589.2021.96713482-s2.0-85125311630Scopusreponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESPengProceedings - 2021 IEEE International Conference on Big Data, Big Data 2021info:eu-repo/semantics/openAccess2024-06-28T13:55:18Zoai:repositorio.unesp.br:11449/234203Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestopendoar:29462024-08-05T14:09:52.829307Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false
dc.title.none.fl_str_mv Detection of Newly Registered Malicious Domains through Passive DNS
title Detection of Newly Registered Malicious Domains through Passive DNS
spellingShingle Detection of Newly Registered Malicious Domains through Passive DNS
Silveira, Marcos Rogério [UNESP]
Data Imbalanced
Domain Name System
Machine Learning
Malicious Domains
Passive DNS
title_short Detection of Newly Registered Malicious Domains through Passive DNS
title_full Detection of Newly Registered Malicious Domains through Passive DNS
title_fullStr Detection of Newly Registered Malicious Domains through Passive DNS
title_full_unstemmed Detection of Newly Registered Malicious Domains through Passive DNS
title_sort Detection of Newly Registered Malicious Domains through Passive DNS
author Silveira, Marcos Rogério [UNESP]
author_facet Silveira, Marcos Rogério [UNESP]
Marcos Da Silva, Leandro [UNESP]
Cansian, Adriano Mauro [UNESP]
Kobayashi, Hugo Koji
author_role author
author2 Marcos Da Silva, Leandro [UNESP]
Cansian, Adriano Mauro [UNESP]
Kobayashi, Hugo Koji
author2_role author
author
author
dc.contributor.none.fl_str_mv Universidade Estadual Paulista (UNESP)
Brazilian Network Information Center (NIC.br)
dc.contributor.author.fl_str_mv Silveira, Marcos Rogério [UNESP]
Marcos Da Silva, Leandro [UNESP]
Cansian, Adriano Mauro [UNESP]
Kobayashi, Hugo Koji
dc.subject.por.fl_str_mv Data Imbalanced
Domain Name System
Machine Learning
Malicious Domains
Passive DNS
topic Data Imbalanced
Domain Name System
Machine Learning
Malicious Domains
Passive DNS
description Due to the importance of DNS for the good functioning of the Internet, malicious users register domains for malicious purposes, such as the spreading of malware and the practice of phishing. In this work, an approach capable of detecting malicious domains just 72 hours after the first DNS query was developed. The data source used was the passive DNS collected from an authoritative TLD server with the enrichment of data later, which generated columns encompassing data related to geolocation, which resulted in 20 features. The model used LightGBM as a machine learning algorithm, and oversampling and undersampling techniques for data balancing, such as Cluster Centroids and K-Means SMOTE, proving efficiency with an average AUC of 0.9763 and F1-score of 0.905, in addition to the TPR of 0.8656 in the validation of the model.
publishDate 2021
dc.date.none.fl_str_mv 2021-01-01
2022-05-01T13:57:35Z
2022-05-01T13:57:35Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/conferenceObject
format conferenceObject
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://dx.doi.org/10.1109/BigData52589.2021.9671348
Proceedings - 2021 IEEE International Conference on Big Data, Big Data 2021, p. 3360-3369.
http://hdl.handle.net/11449/234203
10.1109/BigData52589.2021.9671348
2-s2.0-85125311630
url http://dx.doi.org/10.1109/BigData52589.2021.9671348
http://hdl.handle.net/11449/234203
identifier_str_mv Proceedings - 2021 IEEE International Conference on Big Data, Big Data 2021, p. 3360-3369.
10.1109/BigData52589.2021.9671348
2-s2.0-85125311630
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv Proceedings - 2021 IEEE International Conference on Big Data, Big Data 2021
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv 3360-3369
dc.source.none.fl_str_mv Scopus
reponame:Repositório Institucional da UNESP
instname:Universidade Estadual Paulista (UNESP)
instacron:UNESP
instname_str Universidade Estadual Paulista (UNESP)
instacron_str UNESP
institution UNESP
reponame_str Repositório Institucional da UNESP
collection Repositório Institucional da UNESP
repository.name.fl_str_mv Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)
repository.mail.fl_str_mv
_version_ 1808128325796757504