Descoberta de exploits usando dados da rede social Twitter

Sousa, Daniel Alves de

Descoberta de exploits usando dados da rede social Twitter

Detalhes bibliográficos
Autor(a) principal:	Sousa, Daniel Alves de
Data de Publicação:	2020
Tipo de documento:	Dissertação
Idioma:	por
Título da fonte:	Repositório Institucional da UFU
Texto Completo:	https://repositorio.ufu.br/handle/123456789/29988 http://doi.org/10.14393/ufu.di.2020.657
Resumo:	One crucial aspect of information systems security is the deployment of security patches. The growing number of software vulnerabilities, together with the need for impact analysis in each update, can cause administrators to postpone software patching and leave their systems vulnerable for a long time. Furthermore, studies have shown that many software vulnerabilities have only proof-of-concept exploits, making the identification of real threads even harder. In this scenario, knowledge of which vulnerabilities were exploited in the wild is a powerful tool to help systems administrators prioritize patches. Social media analysis for this specific application can enhance the results and bring more agility by collecting data from online discussions and applying machine learning techniques to detect real-world exploits. In this dissertation, we use a technique that combines Twitter data with public database information to classify vulnerabilities as exploited or not-exploited. We analyze the behavior of different classifying algorithms, investigate the influence of different antivirus data as ground truth, and experiment with various time window sizes. Our findings suggest that using a Light Gradient Boosting Machine (LightGBM) can benefit the results, and for most cases, the statistics related to a tweet and the users who tweeted are more meaningful than the text tweeted. We also demonstrate the importance of using ground-truth data from security companies not mentioned in previous works.

Metadados do item

id	UFU_e7bfaa00aef2986055c72c86be063e29
oai_identifier_str	oai:repositorio.ufu.br:123456789/29988
network_acronym_str	UFU
network_name_str	Repositório Institucional da UFU
repository_id_str
spelling	Descoberta de exploits usando dados da rede social TwitterExploit discovery using Twitter social media dataSegurança da InformaçãoAprendizado de máquinaVulnerabilidades de softwareAmeaças a computadoresExploitsRedes sociaisAntivírusComputer securityMachine learningSoftware vulnerabilityComputer threatsExploitsCNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::SISTEMAS DE COMPUTACAOTwitter (Rede social on-line)Software - ConfiabilidadeAprendizado do computadorOne crucial aspect of information systems security is the deployment of security patches. The growing number of software vulnerabilities, together with the need for impact analysis in each update, can cause administrators to postpone software patching and leave their systems vulnerable for a long time. Furthermore, studies have shown that many software vulnerabilities have only proof-of-concept exploits, making the identification of real threads even harder. In this scenario, knowledge of which vulnerabilities were exploited in the wild is a powerful tool to help systems administrators prioritize patches. Social media analysis for this specific application can enhance the results and bring more agility by collecting data from online discussions and applying machine learning techniques to detect real-world exploits. In this dissertation, we use a technique that combines Twitter data with public database information to classify vulnerabilities as exploited or not-exploited. We analyze the behavior of different classifying algorithms, investigate the influence of different antivirus data as ground truth, and experiment with various time window sizes. Our findings suggest that using a Light Gradient Boosting Machine (LightGBM) can benefit the results, and for most cases, the statistics related to a tweet and the users who tweeted are more meaningful than the text tweeted. We also demonstrate the importance of using ground-truth data from security companies not mentioned in previous works.Dissertação (Mestrado)No gerenciamento de segurança da informação, um aspecto crucial é a instalação de correções para vulnerabilidades de softwares. O crescente número dessas vulnerabilidades, associado à necessidade de análise dos impactos de cada atualização, podem fazer com que administradores adiem atualizações e deixem seus sistemas vulneráveis por muito tempo. Além disso, estudos relacionados apontam que muitas vulnerabilidades são exploradas apenas em provas de conceito, tornando a identificação de ameaças reais ainda mais difícil. Uma técnica que ajude a detectar quais vulnerabilidades possuem exploits no mundo real pode ser uma ferramenta poderosa para ajudar administradores de sistemas. Para agilizar essas detecções, o uso de aprendizado de máquina aplicado a discussões em redes sociais tem se mostrado promissor. Nesta dissertação são aplicadas técnicas de aprendizado de máquina a dados de discussões no Twitter e bases de dados públicas para determinar se uma vulnerabilidade foi ou não explorada. O trabalho também analisa o comportamento de diferentes algoritmos de classificação, investiga a influência do uso de rótulos verdadeiros extraídos de diferentes empresas de antivírus e experimenta com treino em vários tamanhos de janelas temporais. As descobertas deste trabalho sugerem que o uso do ensemble Light Gradient Boosting Machine (LightGBM) e do algoritmo de balanceamento de classes All k-Nearest-Neighbor (AllKNN) pode beneficiar os resultados em termos de F-score e precisão. O trabalho ainda demonstra como o uso de rótulos extraídos de uma única empresa de antivírus pode enviesar o modelo.Universidade Federal de UberlândiaBrasilPrograma de Pós-graduação em Ciência da ComputaçãoPaiva, Elaine Ribeiro de Fariahttp://lattes.cnpq.br/8238524390290386Miani, Rodrigo Sancheshttp://lattes.cnpq.br/2992074747740327Pasquini, RafaelBarbon Junior, SylvioSousa, Daniel Alves de2020-10-01T12:30:36Z2020-10-01T12:30:36Z2020-08-28info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfSOUSA, Daniel Alves de. Descoberta de exploits usando dados da rede social Twitter. 2020. 99 f. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal de Uberlândia, Uberlândia, 2020. DOI http://doi.org/10.14393/ufu.di.2020.657https://repositorio.ufu.br/handle/123456789/29988http://doi.org/10.14393/ufu.di.2020.657porhttp://creativecommons.org/licenses/by-nc-nd/3.0/us/info:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFUinstname:Universidade Federal de Uberlândia (UFU)instacron:UFU2020-10-02T06:17:50Zoai:repositorio.ufu.br:123456789/29988Repositório InstitucionalONGhttp://repositorio.ufu.br/oai/requestdiinf@dirbi.ufu.bropendoar:2020-10-02T06:17:50Repositório Institucional da UFU - Universidade Federal de Uberlândia (UFU)false
dc.title.none.fl_str_mv	Descoberta de exploits usando dados da rede social Twitter Exploit discovery using Twitter social media data
title	Descoberta de exploits usando dados da rede social Twitter
spellingShingle	Descoberta de exploits usando dados da rede social Twitter Sousa, Daniel Alves de Segurança da Informação Aprendizado de máquina Vulnerabilidades de software Ameaças a computadores Exploits Redes sociais Antivírus Computer security Machine learning Software vulnerability Computer threats Exploits CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::SISTEMAS DE COMPUTACAO Twitter (Rede social on-line) Software - Confiabilidade Aprendizado do computador
title_short	Descoberta de exploits usando dados da rede social Twitter
title_full	Descoberta de exploits usando dados da rede social Twitter
title_fullStr	Descoberta de exploits usando dados da rede social Twitter
title_full_unstemmed	Descoberta de exploits usando dados da rede social Twitter
title_sort	Descoberta de exploits usando dados da rede social Twitter
author	Sousa, Daniel Alves de
author_facet	Sousa, Daniel Alves de
author_role	author
dc.contributor.none.fl_str_mv	Paiva, Elaine Ribeiro de Faria http://lattes.cnpq.br/8238524390290386 Miani, Rodrigo Sanches http://lattes.cnpq.br/2992074747740327 Pasquini, Rafael Barbon Junior, Sylvio
dc.contributor.author.fl_str_mv	Sousa, Daniel Alves de
dc.subject.por.fl_str_mv	Segurança da Informação Aprendizado de máquina Vulnerabilidades de software Ameaças a computadores Exploits Redes sociais Antivírus Computer security Machine learning Software vulnerability Computer threats Exploits CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::SISTEMAS DE COMPUTACAO Twitter (Rede social on-line) Software - Confiabilidade Aprendizado do computador
topic	Segurança da Informação Aprendizado de máquina Vulnerabilidades de software Ameaças a computadores Exploits Redes sociais Antivírus Computer security Machine learning Software vulnerability Computer threats Exploits CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::SISTEMAS DE COMPUTACAO Twitter (Rede social on-line) Software - Confiabilidade Aprendizado do computador
description	One crucial aspect of information systems security is the deployment of security patches. The growing number of software vulnerabilities, together with the need for impact analysis in each update, can cause administrators to postpone software patching and leave their systems vulnerable for a long time. Furthermore, studies have shown that many software vulnerabilities have only proof-of-concept exploits, making the identification of real threads even harder. In this scenario, knowledge of which vulnerabilities were exploited in the wild is a powerful tool to help systems administrators prioritize patches. Social media analysis for this specific application can enhance the results and bring more agility by collecting data from online discussions and applying machine learning techniques to detect real-world exploits. In this dissertation, we use a technique that combines Twitter data with public database information to classify vulnerabilities as exploited or not-exploited. We analyze the behavior of different classifying algorithms, investigate the influence of different antivirus data as ground truth, and experiment with various time window sizes. Our findings suggest that using a Light Gradient Boosting Machine (LightGBM) can benefit the results, and for most cases, the statistics related to a tweet and the users who tweeted are more meaningful than the text tweeted. We also demonstrate the importance of using ground-truth data from security companies not mentioned in previous works.
publishDate	2020
dc.date.none.fl_str_mv	2020-10-01T12:30:36Z 2020-10-01T12:30:36Z 2020-08-28
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	SOUSA, Daniel Alves de. Descoberta de exploits usando dados da rede social Twitter. 2020. 99 f. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal de Uberlândia, Uberlândia, 2020. DOI http://doi.org/10.14393/ufu.di.2020.657 https://repositorio.ufu.br/handle/123456789/29988 http://doi.org/10.14393/ufu.di.2020.657
identifier_str_mv	SOUSA, Daniel Alves de. Descoberta de exploits usando dados da rede social Twitter. 2020. 99 f. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal de Uberlândia, Uberlândia, 2020. DOI http://doi.org/10.14393/ufu.di.2020.657
url	https://repositorio.ufu.br/handle/123456789/29988 http://doi.org/10.14393/ufu.di.2020.657
dc.language.iso.fl_str_mv	por
language	por
dc.rights.driver.fl_str_mv	http://creativecommons.org/licenses/by-nc-nd/3.0/us/ info:eu-repo/semantics/openAccess
rights_invalid_str_mv	http://creativecommons.org/licenses/by-nc-nd/3.0/us/
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Universidade Federal de Uberlândia Brasil Programa de Pós-graduação em Ciência da Computação
publisher.none.fl_str_mv	Universidade Federal de Uberlândia Brasil Programa de Pós-graduação em Ciência da Computação
dc.source.none.fl_str_mv	reponame:Repositório Institucional da UFU instname:Universidade Federal de Uberlândia (UFU) instacron:UFU
instname_str	Universidade Federal de Uberlândia (UFU)
instacron_str	UFU
institution	UFU
reponame_str	Repositório Institucional da UFU
collection	Repositório Institucional da UFU
repository.name.fl_str_mv	Repositório Institucional da UFU - Universidade Federal de Uberlândia (UFU)
repository.mail.fl_str_mv	diinf@dirbi.ufu.br
_version_	1805569609011560448

Descoberta de exploits usando dados da rede social Twitter

Registros relacionados