Automatização de processos de machine learning do framework DNS para a detecção de domínios maliciosos

Detalhes bibliográficos
Autor(a) principal: Gardini, Victor Fernandes
Data de Publicação: 2022
Tipo de documento: Trabalho de conclusão de curso
Idioma: por
Título da fonte: Repositório Institucional da UNESP
Texto Completo: http://hdl.handle.net/11449/216224
Resumo: The use of domain names for the practice of malicious activities on the internet is a problem faced on a global scale, with emphasis on Brazil, which is in the ranking of countries most affected by phishing-type cyberattacks. To solve this, several approaches are studied by academia, among them the use of machine learning to classify domains as malicious or legitimate stands out. To deal with this, it was proposed to classify domains in three stages, where each one of them is interconnected through a single system called DNS framework. The system allows training new models and submitting new datasets for the training step, however the previous models and lists are lost. Therefore, approaches used by the academic community were studied that culminated in a set of techniques and approaches to manage machine learning models, these practices are commonly grouped and defined by the term MLOps. From that, it was possible to build a new system with the capacity to store, version and monitor models, lists and system logs, being later integrated with the framework. In this way, ensuring that each of the stages can have, independently, their training sets built incrementally from well-defined operations, without causing the loss of the previous process, in addition, allowing the creation of new models through an automated pipeline, so that it is made available to the production environment.
id UNSP_c83e10f6ce58368e224c3b228a91693d
oai_identifier_str oai:repositorio.unesp.br:11449/216224
network_acronym_str UNSP
network_name_str Repositório Institucional da UNESP
repository_id_str 2946
spelling Automatização de processos de machine learning do framework DNS para a detecção de domínios maliciososAutomating machine learning processes of the DNS framework for malicious domain detectionCybersecurityMachine learningAutomationDNS frameworkCibersegurançaMLOpsAutomatizaçãoFramework DNSThe use of domain names for the practice of malicious activities on the internet is a problem faced on a global scale, with emphasis on Brazil, which is in the ranking of countries most affected by phishing-type cyberattacks. To solve this, several approaches are studied by academia, among them the use of machine learning to classify domains as malicious or legitimate stands out. To deal with this, it was proposed to classify domains in three stages, where each one of them is interconnected through a single system called DNS framework. The system allows training new models and submitting new datasets for the training step, however the previous models and lists are lost. Therefore, approaches used by the academic community were studied that culminated in a set of techniques and approaches to manage machine learning models, these practices are commonly grouped and defined by the term MLOps. From that, it was possible to build a new system with the capacity to store, version and monitor models, lists and system logs, being later integrated with the framework. In this way, ensuring that each of the stages can have, independently, their training sets built incrementally from well-defined operations, without causing the loss of the previous process, in addition, allowing the creation of new models through an automated pipeline, so that it is made available to the production environment.A utilização de nomes de domínio para a prática de atividades maliciosas na internet é um problema enfrentado em escala global, com destaque para o Brasil que está no ranking dos países mais afetados por ciberataques do tipo phishing. Para resolver isso, diversas abordagens são estudadas pela academia, e entre elas destaca-se a utilização de aprendizado de máquina para a classificação de domínios como maliciosos ou legítimos. Para lidar com isso, foi proposta a classificação de domínios em três estágios, onde cada um deles está ligado à um único sistema denominado framework DNS. O sistema permite fazer o treinamento de novos modelos e submeter novos conjuntos de dados para a etapa de treinamento, entretanto os modelos e listas anteriores são descartados durante o processo. Diante disso, foram estudadas abordagens utilizadas pela comunidade acadêmica que culminaram em um conjunto de técnicas e abordagens para gerenciar modelos de aprendizado de máquina, e essas práticas são comumente agrupadas e definidas pelo termo MLOps. A partir disso, foi possível construir um novo sistema com a capacidade de armazenar, versionar e monitorar modelos, listas e logs do sistema, que é posteriormente integrado com o framework. Dessa forma, cada um dos estágios pode ter, de forma independente, os seus conjuntos de treinamento construídos de forma incremental a partir de operações bem definidas, sem ocasionar a perda do processo anterior. Também é possível criar novos modelos por meio de um pipeline automatizado, para que o mesmo seja disponibilizado em ambiente de produção.Fundação para o Desenvolvimento da UNESP (FUNDUNESP)NIC.br: 2764/2018Universidade Estadual Paulista (Unesp)Cansian, Adriano Mauro [UNESP]Universidade Estadual Paulista (Unesp)Gardini, Victor Fernandes2022-01-31T20:16:56Z2022-01-31T20:16:56Z2022-01-13info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/bachelorThesisapplication/pdfhttp://hdl.handle.net/11449/216224porinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESP2023-12-28T06:16:50Zoai:repositorio.unesp.br:11449/216224Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestopendoar:29462024-08-05T21:30:16.379025Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false
dc.title.none.fl_str_mv Automatização de processos de machine learning do framework DNS para a detecção de domínios maliciosos
Automating machine learning processes of the DNS framework for malicious domain detection
title Automatização de processos de machine learning do framework DNS para a detecção de domínios maliciosos
spellingShingle Automatização de processos de machine learning do framework DNS para a detecção de domínios maliciosos
Gardini, Victor Fernandes
Cybersecurity
Machine learning
Automation
DNS framework
Cibersegurança
MLOps
Automatização
Framework DNS
title_short Automatização de processos de machine learning do framework DNS para a detecção de domínios maliciosos
title_full Automatização de processos de machine learning do framework DNS para a detecção de domínios maliciosos
title_fullStr Automatização de processos de machine learning do framework DNS para a detecção de domínios maliciosos
title_full_unstemmed Automatização de processos de machine learning do framework DNS para a detecção de domínios maliciosos
title_sort Automatização de processos de machine learning do framework DNS para a detecção de domínios maliciosos
author Gardini, Victor Fernandes
author_facet Gardini, Victor Fernandes
author_role author
dc.contributor.none.fl_str_mv Cansian, Adriano Mauro [UNESP]
Universidade Estadual Paulista (Unesp)
dc.contributor.author.fl_str_mv Gardini, Victor Fernandes
dc.subject.por.fl_str_mv Cybersecurity
Machine learning
Automation
DNS framework
Cibersegurança
MLOps
Automatização
Framework DNS
topic Cybersecurity
Machine learning
Automation
DNS framework
Cibersegurança
MLOps
Automatização
Framework DNS
description The use of domain names for the practice of malicious activities on the internet is a problem faced on a global scale, with emphasis on Brazil, which is in the ranking of countries most affected by phishing-type cyberattacks. To solve this, several approaches are studied by academia, among them the use of machine learning to classify domains as malicious or legitimate stands out. To deal with this, it was proposed to classify domains in three stages, where each one of them is interconnected through a single system called DNS framework. The system allows training new models and submitting new datasets for the training step, however the previous models and lists are lost. Therefore, approaches used by the academic community were studied that culminated in a set of techniques and approaches to manage machine learning models, these practices are commonly grouped and defined by the term MLOps. From that, it was possible to build a new system with the capacity to store, version and monitor models, lists and system logs, being later integrated with the framework. In this way, ensuring that each of the stages can have, independently, their training sets built incrementally from well-defined operations, without causing the loss of the previous process, in addition, allowing the creation of new models through an automated pipeline, so that it is made available to the production environment.
publishDate 2022
dc.date.none.fl_str_mv 2022-01-31T20:16:56Z
2022-01-31T20:16:56Z
2022-01-13
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/bachelorThesis
format bachelorThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/11449/216224
url http://hdl.handle.net/11449/216224
dc.language.iso.fl_str_mv por
language por
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Universidade Estadual Paulista (Unesp)
publisher.none.fl_str_mv Universidade Estadual Paulista (Unesp)
dc.source.none.fl_str_mv reponame:Repositório Institucional da UNESP
instname:Universidade Estadual Paulista (UNESP)
instacron:UNESP
instname_str Universidade Estadual Paulista (UNESP)
instacron_str UNESP
institution UNESP
reponame_str Repositório Institucional da UNESP
collection Repositório Institucional da UNESP
repository.name.fl_str_mv Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)
repository.mail.fl_str_mv
_version_ 1808129327944957952