Malware detection methods for Android mobile applications

Detalhes bibliográficos
Autor(a) principal: Lopes, João Pedro Lapa da Silva
Data de Publicação: 2020
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10071/22189
Resumo: Advancements in mobile computing are attracting traditional device users to transition toward mobile platforms to fulfil their data processing needs. Among these, the Android platform is the most popular, holding the majority of the market share due to its open-source policy and ability to install applications from different application stores. This fact, coupled with the amount of sensitive data these devices now store, makes it attractive for malware authors to attack the Android platform, causing a large influx of malicious applications in the ecosystem. Traditional malware detection methods cannot effectively control and prevent this influx, demanding an automatic and intelligent approach such as machine learning. In this thesis, three machine learning algorithms, XGBoost, SVM and K-NN were trained with several features, with a focus on Android permissions , to measure the effectiveness of applying machine learning techniques to combat the proliferation of malware. Given goodware to malware ratio of 99/1, four experiments with an under-sampled version of the dataset with a ratio of 70/30 were conducted to test different subsets of the feature space as well as feature elimination and aggregation before training the algorithms with the full set of features using feature normalization across two distinct scenarios. This approach showed promising results, with XGBoost, SVM and K-NN distinguishing between malware and goodware with a score of 90 % (Area Under the Receiver Operating Curve values).
id RCAP_0f82323247986b559b57c34fbacfba3b
oai_identifier_str oai:repositorio.iscte-iul.pt:10071/22189
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Malware detection methods for Android mobile applicationsMalware detectionMachine learningAndroidSecurityMobileDeteção de malwareAprendizagem automáticaSegurançaMóvelAdvancements in mobile computing are attracting traditional device users to transition toward mobile platforms to fulfil their data processing needs. Among these, the Android platform is the most popular, holding the majority of the market share due to its open-source policy and ability to install applications from different application stores. This fact, coupled with the amount of sensitive data these devices now store, makes it attractive for malware authors to attack the Android platform, causing a large influx of malicious applications in the ecosystem. Traditional malware detection methods cannot effectively control and prevent this influx, demanding an automatic and intelligent approach such as machine learning. In this thesis, three machine learning algorithms, XGBoost, SVM and K-NN were trained with several features, with a focus on Android permissions , to measure the effectiveness of applying machine learning techniques to combat the proliferation of malware. Given goodware to malware ratio of 99/1, four experiments with an under-sampled version of the dataset with a ratio of 70/30 were conducted to test different subsets of the feature space as well as feature elimination and aggregation before training the algorithms with the full set of features using feature normalization across two distinct scenarios. This approach showed promising results, with XGBoost, SVM and K-NN distinguishing between malware and goodware with a score of 90 % (Area Under the Receiver Operating Curve values).Os avanços na computação móvel estão a atrair utilizadores de dispositivos tradicionais a transitar para as plataformas móveis para atender às suas necessidades de processamento de dados. Entre estas, a plataforma Android é a mais popular, detendo a maioria da quota de mercado devido à sua política open-source e capacidade de instalar aplicações através de várias lojas de aplicações. Este facto, conjuntamente com a quantidade de dados sensíveis que estes dispositivos agora armazenam, torna o ataque à plataforma Android atraente para os autores de malware, causando um grande fluxo de aplicações maliciosas no ecossistema. Os métodos tradicionais de deteção de malware não conseguem controlar e prevenir este fluxo eficazmente, exigindo uma abordagem automática e inteligente, como a aprendizagem automática. Nesta tese, três algoritmos de aprendizagem automática, XGBoost, SVM e K-NN, foram treinados com diversas características, focando-se nas permissões Android e características estáticas das aplicações, para medir a eficácia da aplicação de técnicas de aprendizagem automática no combate à proliferação de malware. Dado o rácio de goodware para malware de 99/1 do conjunto de dados, realizaram-se quatro experiências com uma versão subamostrada do mesmo com um rácio de 70/30 para testar diferentes subconjuntos do espaço de características bem como eliminação e agregação de características antes de treinar os algoritmos com o conjunto completo de características usando normalização de características em dois cenários. Esta abordagem apresentou resultados promissores, com XGBoost, SVM e K-NN distinguindo entre malware e goodware com um score de 90 % (valores Area Under the Receiver Operating Curve).2021-02-24T16:02:33Z2020-10-23T00:00:00Z2020-10-232020-09info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10071/22189TID:202645371engLopes, João Pedro Lapa da Silvainfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-09T17:59:38Zoai:repositorio.iscte-iul.pt:10071/22189Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T22:31:20.455270Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Malware detection methods for Android mobile applications
title Malware detection methods for Android mobile applications
spellingShingle Malware detection methods for Android mobile applications
Lopes, João Pedro Lapa da Silva
Malware detection
Machine learning
Android
Security
Mobile
Deteção de malware
Aprendizagem automática
Segurança
Móvel
title_short Malware detection methods for Android mobile applications
title_full Malware detection methods for Android mobile applications
title_fullStr Malware detection methods for Android mobile applications
title_full_unstemmed Malware detection methods for Android mobile applications
title_sort Malware detection methods for Android mobile applications
author Lopes, João Pedro Lapa da Silva
author_facet Lopes, João Pedro Lapa da Silva
author_role author
dc.contributor.author.fl_str_mv Lopes, João Pedro Lapa da Silva
dc.subject.por.fl_str_mv Malware detection
Machine learning
Android
Security
Mobile
Deteção de malware
Aprendizagem automática
Segurança
Móvel
topic Malware detection
Machine learning
Android
Security
Mobile
Deteção de malware
Aprendizagem automática
Segurança
Móvel
description Advancements in mobile computing are attracting traditional device users to transition toward mobile platforms to fulfil their data processing needs. Among these, the Android platform is the most popular, holding the majority of the market share due to its open-source policy and ability to install applications from different application stores. This fact, coupled with the amount of sensitive data these devices now store, makes it attractive for malware authors to attack the Android platform, causing a large influx of malicious applications in the ecosystem. Traditional malware detection methods cannot effectively control and prevent this influx, demanding an automatic and intelligent approach such as machine learning. In this thesis, three machine learning algorithms, XGBoost, SVM and K-NN were trained with several features, with a focus on Android permissions , to measure the effectiveness of applying machine learning techniques to combat the proliferation of malware. Given goodware to malware ratio of 99/1, four experiments with an under-sampled version of the dataset with a ratio of 70/30 were conducted to test different subsets of the feature space as well as feature elimination and aggregation before training the algorithms with the full set of features using feature normalization across two distinct scenarios. This approach showed promising results, with XGBoost, SVM and K-NN distinguishing between malware and goodware with a score of 90 % (Area Under the Receiver Operating Curve values).
publishDate 2020
dc.date.none.fl_str_mv 2020-10-23T00:00:00Z
2020-10-23
2020-09
2021-02-24T16:02:33Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10071/22189
TID:202645371
url http://hdl.handle.net/10071/22189
identifier_str_mv TID:202645371
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799134875450605568