Detection of Newly Registered Malicious Domains through Passive DNS
Autor(a) principal: | |
---|---|
Data de Publicação: | 2021 |
Outros Autores: | , , |
Tipo de documento: | Artigo de conferência |
Idioma: | eng |
Título da fonte: | Repositório Institucional da UNESP |
Texto Completo: | http://dx.doi.org/10.1109/BigData52589.2021.9671348 http://hdl.handle.net/11449/234203 |
Resumo: | Due to the importance of DNS for the good functioning of the Internet, malicious users register domains for malicious purposes, such as the spreading of malware and the practice of phishing. In this work, an approach capable of detecting malicious domains just 72 hours after the first DNS query was developed. The data source used was the passive DNS collected from an authoritative TLD server with the enrichment of data later, which generated columns encompassing data related to geolocation, which resulted in 20 features. The model used LightGBM as a machine learning algorithm, and oversampling and undersampling techniques for data balancing, such as Cluster Centroids and K-Means SMOTE, proving efficiency with an average AUC of 0.9763 and F1-score of 0.905, in addition to the TPR of 0.8656 in the validation of the model. |
id |
UNSP_a6287bc672fa509f0bfbc3167047b4d2 |
---|---|
oai_identifier_str |
oai:repositorio.unesp.br:11449/234203 |
network_acronym_str |
UNSP |
network_name_str |
Repositório Institucional da UNESP |
repository_id_str |
2946 |
spelling |
Detection of Newly Registered Malicious Domains through Passive DNSData ImbalancedDomain Name SystemMachine LearningMalicious DomainsPassive DNSDue to the importance of DNS for the good functioning of the Internet, malicious users register domains for malicious purposes, such as the spreading of malware and the practice of phishing. In this work, an approach capable of detecting malicious domains just 72 hours after the first DNS query was developed. The data source used was the passive DNS collected from an authoritative TLD server with the enrichment of data later, which generated columns encompassing data related to geolocation, which resulted in 20 features. The model used LightGBM as a machine learning algorithm, and oversampling and undersampling techniques for data balancing, such as Cluster Centroids and K-Means SMOTE, proving efficiency with an average AUC of 0.9763 and F1-score of 0.905, in addition to the TPR of 0.8656 in the validation of the model.São Paulo State University (UNESP)Brazilian Network Information Center (NIC.br)São Paulo State University (UNESP)Universidade Estadual Paulista (UNESP)Brazilian Network Information Center (NIC.br)Silveira, Marcos Rogério [UNESP]Marcos Da Silva, Leandro [UNESP]Cansian, Adriano Mauro [UNESP]Kobayashi, Hugo Koji2022-05-01T13:57:35Z2022-05-01T13:57:35Z2021-01-01info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/conferenceObject3360-3369http://dx.doi.org/10.1109/BigData52589.2021.9671348Proceedings - 2021 IEEE International Conference on Big Data, Big Data 2021, p. 3360-3369.http://hdl.handle.net/11449/23420310.1109/BigData52589.2021.96713482-s2.0-85125311630Scopusreponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESPengProceedings - 2021 IEEE International Conference on Big Data, Big Data 2021info:eu-repo/semantics/openAccess2024-06-28T13:55:18Zoai:repositorio.unesp.br:11449/234203Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestopendoar:29462024-08-05T14:09:52.829307Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false |
dc.title.none.fl_str_mv |
Detection of Newly Registered Malicious Domains through Passive DNS |
title |
Detection of Newly Registered Malicious Domains through Passive DNS |
spellingShingle |
Detection of Newly Registered Malicious Domains through Passive DNS Silveira, Marcos Rogério [UNESP] Data Imbalanced Domain Name System Machine Learning Malicious Domains Passive DNS |
title_short |
Detection of Newly Registered Malicious Domains through Passive DNS |
title_full |
Detection of Newly Registered Malicious Domains through Passive DNS |
title_fullStr |
Detection of Newly Registered Malicious Domains through Passive DNS |
title_full_unstemmed |
Detection of Newly Registered Malicious Domains through Passive DNS |
title_sort |
Detection of Newly Registered Malicious Domains through Passive DNS |
author |
Silveira, Marcos Rogério [UNESP] |
author_facet |
Silveira, Marcos Rogério [UNESP] Marcos Da Silva, Leandro [UNESP] Cansian, Adriano Mauro [UNESP] Kobayashi, Hugo Koji |
author_role |
author |
author2 |
Marcos Da Silva, Leandro [UNESP] Cansian, Adriano Mauro [UNESP] Kobayashi, Hugo Koji |
author2_role |
author author author |
dc.contributor.none.fl_str_mv |
Universidade Estadual Paulista (UNESP) Brazilian Network Information Center (NIC.br) |
dc.contributor.author.fl_str_mv |
Silveira, Marcos Rogério [UNESP] Marcos Da Silva, Leandro [UNESP] Cansian, Adriano Mauro [UNESP] Kobayashi, Hugo Koji |
dc.subject.por.fl_str_mv |
Data Imbalanced Domain Name System Machine Learning Malicious Domains Passive DNS |
topic |
Data Imbalanced Domain Name System Machine Learning Malicious Domains Passive DNS |
description |
Due to the importance of DNS for the good functioning of the Internet, malicious users register domains for malicious purposes, such as the spreading of malware and the practice of phishing. In this work, an approach capable of detecting malicious domains just 72 hours after the first DNS query was developed. The data source used was the passive DNS collected from an authoritative TLD server with the enrichment of data later, which generated columns encompassing data related to geolocation, which resulted in 20 features. The model used LightGBM as a machine learning algorithm, and oversampling and undersampling techniques for data balancing, such as Cluster Centroids and K-Means SMOTE, proving efficiency with an average AUC of 0.9763 and F1-score of 0.905, in addition to the TPR of 0.8656 in the validation of the model. |
publishDate |
2021 |
dc.date.none.fl_str_mv |
2021-01-01 2022-05-01T13:57:35Z 2022-05-01T13:57:35Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/conferenceObject |
format |
conferenceObject |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://dx.doi.org/10.1109/BigData52589.2021.9671348 Proceedings - 2021 IEEE International Conference on Big Data, Big Data 2021, p. 3360-3369. http://hdl.handle.net/11449/234203 10.1109/BigData52589.2021.9671348 2-s2.0-85125311630 |
url |
http://dx.doi.org/10.1109/BigData52589.2021.9671348 http://hdl.handle.net/11449/234203 |
identifier_str_mv |
Proceedings - 2021 IEEE International Conference on Big Data, Big Data 2021, p. 3360-3369. 10.1109/BigData52589.2021.9671348 2-s2.0-85125311630 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
Proceedings - 2021 IEEE International Conference on Big Data, Big Data 2021 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
3360-3369 |
dc.source.none.fl_str_mv |
Scopus reponame:Repositório Institucional da UNESP instname:Universidade Estadual Paulista (UNESP) instacron:UNESP |
instname_str |
Universidade Estadual Paulista (UNESP) |
instacron_str |
UNESP |
institution |
UNESP |
reponame_str |
Repositório Institucional da UNESP |
collection |
Repositório Institucional da UNESP |
repository.name.fl_str_mv |
Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP) |
repository.mail.fl_str_mv |
|
_version_ |
1808128325796757504 |