Malware detection in macOS using supervised learning
Autor(a) principal: | |
---|---|
Data de Publicação: | 2022 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Institucional da UFPE |
dARK ID: | ark:/64986/0013000015tj5 |
Texto Completo: | https://repositorio.ufpe.br/handle/123456789/46235 |
Resumo: | The development of macOS malware has grown significantly in recent years. Attackers have become more sophisticated and more targeted with the emergence of new dangerous malware families for macOS. However, since the malware detection problem is very dependent on the platform, solutions previously proposed for other operating systems cannot be directly used in macOS. Malware detection is one of the main pillars of endpoint security. Unfortunately, there are very few works on macOS endpoint security, which is considered a largely unexplored territory. Currently, the only malware detection mechanism in macOS is a signature-based system with less than 200 rules as of 2021, called XProtect. Recent works that attempted to improve the detection of malwares in macOS have methodology limitations, such as the lack of a large macOS malware dataset and issues that arise with imbalanced datasets. In this work, we bring the malware detection issue to the macOS operating system and evaluate how supervised machine learning algorithms can be used to improve endpoint security in the macOS ecosystem. We create a new and larger dataset of 631 malware and 10,141 benign software using public sources and extracting information from the Mach-O format. We evaluate the performance of seven different machine learning algorithms, two sampling strategies and four feature reduction techniques in the detection of malwares in macOS. As a result, we present models that are better than macOS native protections, with detection rates larger than 90% while maintaining a false alarm rate of less than 1%. The presented models successfully demonstrate that macOS security can be improved by using static characteristics of native executables in combination with common machine learning algorithms. |
id |
UFPE_d7cb35bf71e5f26dd01e117b14a2124b |
---|---|
oai_identifier_str |
oai:repositorio.ufpe.br:123456789/46235 |
network_acronym_str |
UFPE |
network_name_str |
Repositório Institucional da UFPE |
repository_id_str |
2221 |
spelling |
BURGARDT, Caio Augusto Pereirahttp://lattes.cnpq.br/0812104184657634http://lattes.cnpq.br/9838400375894439CAMPELO, Divanilson Rodrigo de Sousa2022-09-08T12:07:45Z2022-09-08T12:07:45Z2022-02-25BURGARDT, Caio Augusto Pereira. Malware detection in macOS using supervised learning. 2022. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de Pernambuco, Recife, 2022.https://repositorio.ufpe.br/handle/123456789/46235ark:/64986/0013000015tj5The development of macOS malware has grown significantly in recent years. Attackers have become more sophisticated and more targeted with the emergence of new dangerous malware families for macOS. However, since the malware detection problem is very dependent on the platform, solutions previously proposed for other operating systems cannot be directly used in macOS. Malware detection is one of the main pillars of endpoint security. Unfortunately, there are very few works on macOS endpoint security, which is considered a largely unexplored territory. Currently, the only malware detection mechanism in macOS is a signature-based system with less than 200 rules as of 2021, called XProtect. Recent works that attempted to improve the detection of malwares in macOS have methodology limitations, such as the lack of a large macOS malware dataset and issues that arise with imbalanced datasets. In this work, we bring the malware detection issue to the macOS operating system and evaluate how supervised machine learning algorithms can be used to improve endpoint security in the macOS ecosystem. We create a new and larger dataset of 631 malware and 10,141 benign software using public sources and extracting information from the Mach-O format. We evaluate the performance of seven different machine learning algorithms, two sampling strategies and four feature reduction techniques in the detection of malwares in macOS. As a result, we present models that are better than macOS native protections, with detection rates larger than 90% while maintaining a false alarm rate of less than 1%. The presented models successfully demonstrate that macOS security can be improved by using static characteristics of native executables in combination with common machine learning algorithms.O desenvolvimento de malware para macOS cresceu significativamente nos últimos anos. Os invasores se tornaram mais sofisticados e mais direcionados com o surgimento de novas famílias de malware perigosas para o macOS. No entanto, como o problema de detecção de malware é muito dependente da plataforma, as soluções propostas para outros sistemas operacionais não podem ser usadas diretamente no macOS. A detecção de malware é um dos principais pilares da segurança de endpoints. Infelizmente, existem muito poucos trabalhos sobre a segurança de endpoint do macOS, que é considerada território pouco investigado.Atualmente, o único mecanismo de detecção de malware no macOS é um sistema baseado em assinaturas com menos de 200 regras em 2021, conhecido como XProtect. Trabalhos recentes que tentaram melhorar a detecção de malwares no macOS têm limitações de metodologia, como a falta de um grande conjunto de dados de malware do macOS e problemas que surgem com conjuntos de dados em classes desequilibradas.Nessa dissertação, trazemos o problema de detecção de malware para o sistema operacional macOS e avaliamos como algoritmos de aprendizado de máquina supervisionados podem ser usados para melhorar a segurança de end - point do ecossistema macOS. Criamos um novo dataset extraindo informações do formato Mach-O de 631 malwares e 10.141 softwares benignos de fontes públicas. Avaliamos o desempenho de sete algoritmos de aprendizagem de máquina em conjunto com duas estratégias de amostragem e quatro técnicas de redução de features para a detecção de malwares no macOS. Como resultado, apresentamos modelos melhores que as proteções nativas do macOS, com taxas de detecção superiores a 90% e taxas de alarmes falsos inferiores a 1%. Os modelos apresentados demonstram com sucesso que a segurança do macOS pode ser aprimorada usando características estáticas de executáveis nativos em combinação com algoritmos populares de aprendizagem de máquina.engUniversidade Federal de PernambucoPrograma de Pos Graduacao em Ciencia da ComputacaoUFPEBrasilhttp://creativecommons.org/licenses/by-nc-nd/3.0/br/info:eu-repo/semantics/openAccessRedes de ComputadoresAprendizagem de máquinaMalware detection in macOS using supervised learninginfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesismestradoreponame:Repositório Institucional da UFPEinstname:Universidade Federal de Pernambuco (UFPE)instacron:UFPEORIGINALDISSERTAÇÃO Caio Augusto Pereira Burgardt.pdfDISSERTAÇÃO Caio Augusto Pereira Burgardt.pdfapplication/pdf1158998https://repositorio.ufpe.br/bitstream/123456789/46235/1/DISSERTA%c3%87%c3%83O%20Caio%20Augusto%20Pereira%20Burgardt.pdf85e278dcc9f10e9021c4e8458beb703cMD51LICENSElicense.txtlicense.txttext/plain; charset=utf-82142https://repositorio.ufpe.br/bitstream/123456789/46235/3/license.txt6928b9260b07fb2755249a5ca9903395MD53CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8811https://repositorio.ufpe.br/bitstream/123456789/46235/2/license_rdfe39d27027a6cc9cb039ad269a5db8e34MD52TEXTDISSERTAÇÃO Caio Augusto Pereira Burgardt.pdf.txtDISSERTAÇÃO Caio Augusto Pereira Burgardt.pdf.txtExtracted texttext/plain80386https://repositorio.ufpe.br/bitstream/123456789/46235/4/DISSERTA%c3%87%c3%83O%20Caio%20Augusto%20Pereira%20Burgardt.pdf.txt0dc28a3ef74fa0deababd8aa40a3a7d0MD54THUMBNAILDISSERTAÇÃO Caio Augusto Pereira Burgardt.pdf.jpgDISSERTAÇÃO Caio Augusto Pereira Burgardt.pdf.jpgGenerated Thumbnailimage/jpeg1220https://repositorio.ufpe.br/bitstream/123456789/46235/5/DISSERTA%c3%87%c3%83O%20Caio%20Augusto%20Pereira%20Burgardt.pdf.jpgb33b6f2d3e1a1781bb0d295b5262eb20MD55123456789/462352022-09-09 03:05:11.253oai:repositorio.ufpe.br:123456789/46235VGVybW8gZGUgRGVww7NzaXRvIExlZ2FsIGUgQXV0b3JpemHDp8OjbyBwYXJhIFB1YmxpY2HDp8OjbyBkZSBEb2N1bWVudG9zIG5vIFJlcG9zaXTDs3JpbyBEaWdpdGFsIGRhIFVGUEUKIAoKRGVjbGFybyBlc3RhciBjaWVudGUgZGUgcXVlIGVzdGUgVGVybW8gZGUgRGVww7NzaXRvIExlZ2FsIGUgQXV0b3JpemHDp8OjbyB0ZW0gbyBvYmpldGl2byBkZSBkaXZ1bGdhw6fDo28gZG9zIGRvY3VtZW50b3MgZGVwb3NpdGFkb3Mgbm8gUmVwb3NpdMOzcmlvIERpZ2l0YWwgZGEgVUZQRSBlIGRlY2xhcm8gcXVlOgoKSSAtICBvIGNvbnRlw7pkbyBkaXNwb25pYmlsaXphZG8gw6kgZGUgcmVzcG9uc2FiaWxpZGFkZSBkZSBzdWEgYXV0b3JpYTsKCklJIC0gbyBjb250ZcO6ZG8gw6kgb3JpZ2luYWwsIGUgc2UgbyB0cmFiYWxobyBlL291IHBhbGF2cmFzIGRlIG91dHJhcyBwZXNzb2FzIGZvcmFtIHV0aWxpemFkb3MsIGVzdGFzIGZvcmFtIGRldmlkYW1lbnRlIHJlY29uaGVjaWRhczsKCklJSSAtIHF1YW5kbyB0cmF0YXItc2UgZGUgVHJhYmFsaG8gZGUgQ29uY2x1c8OjbyBkZSBDdXJzbywgRGlzc2VydGHDp8OjbyBvdSBUZXNlOiBvIGFycXVpdm8gZGVwb3NpdGFkbyBjb3JyZXNwb25kZSDDoCB2ZXJzw6NvIGZpbmFsIGRvIHRyYWJhbGhvOwoKSVYgLSBxdWFuZG8gdHJhdGFyLXNlIGRlIFRyYWJhbGhvIGRlIENvbmNsdXPDo28gZGUgQ3Vyc28sIERpc3NlcnRhw6fDo28gb3UgVGVzZTogZXN0b3UgY2llbnRlIGRlIHF1ZSBhIGFsdGVyYcOnw6NvIGRhIG1vZGFsaWRhZGUgZGUgYWNlc3NvIGFvIGRvY3VtZW50byBhcMOzcyBvIGRlcMOzc2l0byBlIGFudGVzIGRlIGZpbmRhciBvIHBlcsOtb2RvIGRlIGVtYmFyZ28sIHF1YW5kbyBmb3IgZXNjb2xoaWRvIGFjZXNzbyByZXN0cml0bywgc2Vyw6EgcGVybWl0aWRhIG1lZGlhbnRlIHNvbGljaXRhw6fDo28gZG8gKGEpIGF1dG9yIChhKSBhbyBTaXN0ZW1hIEludGVncmFkbyBkZSBCaWJsaW90ZWNhcyBkYSBVRlBFIChTSUIvVUZQRSkuCgogClBhcmEgdHJhYmFsaG9zIGVtIEFjZXNzbyBBYmVydG86CgpOYSBxdWFsaWRhZGUgZGUgdGl0dWxhciBkb3MgZGlyZWl0b3MgYXV0b3JhaXMgZGUgYXV0b3IgcXVlIHJlY2FlbSBzb2JyZSBlc3RlIGRvY3VtZW50bywgZnVuZGFtZW50YWRvIG5hIExlaSBkZSBEaXJlaXRvIEF1dG9yYWwgbm8gOS42MTAsIGRlIDE5IGRlIGZldmVyZWlybyBkZSAxOTk4LCBhcnQuIDI5LCBpbmNpc28gSUlJLCBhdXRvcml6byBhIFVuaXZlcnNpZGFkZSBGZWRlcmFsIGRlIFBlcm5hbWJ1Y28gYSBkaXNwb25pYmlsaXphciBncmF0dWl0YW1lbnRlLCBzZW0gcmVzc2FyY2ltZW50byBkb3MgZGlyZWl0b3MgYXV0b3JhaXMsIHBhcmEgZmlucyBkZSBsZWl0dXJhLCBpbXByZXNzw6NvIGUvb3UgZG93bmxvYWQgKGFxdWlzacOnw6NvKSBhdHJhdsOpcyBkbyBzaXRlIGRvIFJlcG9zaXTDs3JpbyBEaWdpdGFsIGRhIFVGUEUgbm8gZW5kZXJlw6dvIGh0dHA6Ly93d3cucmVwb3NpdG9yaW8udWZwZS5iciwgYSBwYXJ0aXIgZGEgZGF0YSBkZSBkZXDDs3NpdG8uCgogClBhcmEgdHJhYmFsaG9zIGVtIEFjZXNzbyBSZXN0cml0bzoKCk5hIHF1YWxpZGFkZSBkZSB0aXR1bGFyIGRvcyBkaXJlaXRvcyBhdXRvcmFpcyBkZSBhdXRvciBxdWUgcmVjYWVtIHNvYnJlIGVzdGUgZG9jdW1lbnRvLCBmdW5kYW1lbnRhZG8gbmEgTGVpIGRlIERpcmVpdG8gQXV0b3JhbCBubyA5LjYxMCBkZSAxOSBkZSBmZXZlcmVpcm8gZGUgMTk5OCwgYXJ0LiAyOSwgaW5jaXNvIElJSSwgYXV0b3Jpem8gYSBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkZSBQZXJuYW1idWNvIGEgZGlzcG9uaWJpbGl6YXIgZ3JhdHVpdGFtZW50ZSwgc2VtIHJlc3NhcmNpbWVudG8gZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCBwYXJhIGZpbnMgZGUgbGVpdHVyYSwgaW1wcmVzc8OjbyBlL291IGRvd25sb2FkIChhcXVpc2nDp8OjbykgYXRyYXbDqXMgZG8gc2l0ZSBkbyBSZXBvc2l0w7NyaW8gRGlnaXRhbCBkYSBVRlBFIG5vIGVuZGVyZcOnbyBodHRwOi8vd3d3LnJlcG9zaXRvcmlvLnVmcGUuYnIsIHF1YW5kbyBmaW5kYXIgbyBwZXLDrW9kbyBkZSBlbWJhcmdvIGNvbmRpemVudGUgYW8gdGlwbyBkZSBkb2N1bWVudG8sIGNvbmZvcm1lIGluZGljYWRvIG5vIGNhbXBvIERhdGEgZGUgRW1iYXJnby4KRepositório InstitucionalPUBhttps://repositorio.ufpe.br/oai/requestattena@ufpe.bropendoar:22212022-09-09T06:05:11Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE)false |
dc.title.pt_BR.fl_str_mv |
Malware detection in macOS using supervised learning |
title |
Malware detection in macOS using supervised learning |
spellingShingle |
Malware detection in macOS using supervised learning BURGARDT, Caio Augusto Pereira Redes de Computadores Aprendizagem de máquina |
title_short |
Malware detection in macOS using supervised learning |
title_full |
Malware detection in macOS using supervised learning |
title_fullStr |
Malware detection in macOS using supervised learning |
title_full_unstemmed |
Malware detection in macOS using supervised learning |
title_sort |
Malware detection in macOS using supervised learning |
author |
BURGARDT, Caio Augusto Pereira |
author_facet |
BURGARDT, Caio Augusto Pereira |
author_role |
author |
dc.contributor.authorLattes.pt_BR.fl_str_mv |
http://lattes.cnpq.br/0812104184657634 |
dc.contributor.advisorLattes.pt_BR.fl_str_mv |
http://lattes.cnpq.br/9838400375894439 |
dc.contributor.author.fl_str_mv |
BURGARDT, Caio Augusto Pereira |
dc.contributor.advisor1.fl_str_mv |
CAMPELO, Divanilson Rodrigo de Sousa |
contributor_str_mv |
CAMPELO, Divanilson Rodrigo de Sousa |
dc.subject.por.fl_str_mv |
Redes de Computadores Aprendizagem de máquina |
topic |
Redes de Computadores Aprendizagem de máquina |
description |
The development of macOS malware has grown significantly in recent years. Attackers have become more sophisticated and more targeted with the emergence of new dangerous malware families for macOS. However, since the malware detection problem is very dependent on the platform, solutions previously proposed for other operating systems cannot be directly used in macOS. Malware detection is one of the main pillars of endpoint security. Unfortunately, there are very few works on macOS endpoint security, which is considered a largely unexplored territory. Currently, the only malware detection mechanism in macOS is a signature-based system with less than 200 rules as of 2021, called XProtect. Recent works that attempted to improve the detection of malwares in macOS have methodology limitations, such as the lack of a large macOS malware dataset and issues that arise with imbalanced datasets. In this work, we bring the malware detection issue to the macOS operating system and evaluate how supervised machine learning algorithms can be used to improve endpoint security in the macOS ecosystem. We create a new and larger dataset of 631 malware and 10,141 benign software using public sources and extracting information from the Mach-O format. We evaluate the performance of seven different machine learning algorithms, two sampling strategies and four feature reduction techniques in the detection of malwares in macOS. As a result, we present models that are better than macOS native protections, with detection rates larger than 90% while maintaining a false alarm rate of less than 1%. The presented models successfully demonstrate that macOS security can be improved by using static characteristics of native executables in combination with common machine learning algorithms. |
publishDate |
2022 |
dc.date.accessioned.fl_str_mv |
2022-09-08T12:07:45Z |
dc.date.available.fl_str_mv |
2022-09-08T12:07:45Z |
dc.date.issued.fl_str_mv |
2022-02-25 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.citation.fl_str_mv |
BURGARDT, Caio Augusto Pereira. Malware detection in macOS using supervised learning. 2022. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de Pernambuco, Recife, 2022. |
dc.identifier.uri.fl_str_mv |
https://repositorio.ufpe.br/handle/123456789/46235 |
dc.identifier.dark.fl_str_mv |
ark:/64986/0013000015tj5 |
identifier_str_mv |
BURGARDT, Caio Augusto Pereira. Malware detection in macOS using supervised learning. 2022. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de Pernambuco, Recife, 2022. ark:/64986/0013000015tj5 |
url |
https://repositorio.ufpe.br/handle/123456789/46235 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
http://creativecommons.org/licenses/by-nc-nd/3.0/br/ info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
http://creativecommons.org/licenses/by-nc-nd/3.0/br/ |
eu_rights_str_mv |
openAccess |
dc.publisher.none.fl_str_mv |
Universidade Federal de Pernambuco |
dc.publisher.program.fl_str_mv |
Programa de Pos Graduacao em Ciencia da Computacao |
dc.publisher.initials.fl_str_mv |
UFPE |
dc.publisher.country.fl_str_mv |
Brasil |
publisher.none.fl_str_mv |
Universidade Federal de Pernambuco |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFPE instname:Universidade Federal de Pernambuco (UFPE) instacron:UFPE |
instname_str |
Universidade Federal de Pernambuco (UFPE) |
instacron_str |
UFPE |
institution |
UFPE |
reponame_str |
Repositório Institucional da UFPE |
collection |
Repositório Institucional da UFPE |
bitstream.url.fl_str_mv |
https://repositorio.ufpe.br/bitstream/123456789/46235/1/DISSERTA%c3%87%c3%83O%20Caio%20Augusto%20Pereira%20Burgardt.pdf https://repositorio.ufpe.br/bitstream/123456789/46235/3/license.txt https://repositorio.ufpe.br/bitstream/123456789/46235/2/license_rdf https://repositorio.ufpe.br/bitstream/123456789/46235/4/DISSERTA%c3%87%c3%83O%20Caio%20Augusto%20Pereira%20Burgardt.pdf.txt https://repositorio.ufpe.br/bitstream/123456789/46235/5/DISSERTA%c3%87%c3%83O%20Caio%20Augusto%20Pereira%20Burgardt.pdf.jpg |
bitstream.checksum.fl_str_mv |
85e278dcc9f10e9021c4e8458beb703c 6928b9260b07fb2755249a5ca9903395 e39d27027a6cc9cb039ad269a5db8e34 0dc28a3ef74fa0deababd8aa40a3a7d0 b33b6f2d3e1a1781bb0d295b5262eb20 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 MD5 MD5 |
repository.name.fl_str_mv |
Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE) |
repository.mail.fl_str_mv |
attena@ufpe.br |
_version_ |
1814448451199434752 |