Malware classification on time series data through machine learning

Detalhes bibliográficos
Autor(a) principal: Diogo Moutinho de Almeida
Data de Publicação: 2016
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: https://hdl.handle.net/10216/85701
Resumo: Malware classification can be a challenge considering the great amount of variety and increasing emergence of malware, as well as, available classification methods. For this reason, it is not unusual for a file to be considered a different type of malicious file by different classifiers. In fact, an assignment made by a single classifier might change through time, as a consequence of methods refinements or new discoveries. When using multiple independent classifiers, past classifications of a certain file might help on deciding on which one to trust. This dissertation aims at finding a way to facilitate this analysis by collecting historical data on files that already have assigned their final and last classification, and determine which machine learning algorithm can better predict a new file classification given this very same data. Besides the historical data, other characteristics shall be taken into account like: source of the file, filetype and filesize. The machine learning algorithms we have used are: C4.5, Random Forests, Multi-Layer Perceptron (MLP) and Long short-term memory (LSTM). It was possible with this approach to find an alternative way in finding the correct malware classification of a file, given a multiple number of classifiers, taking into account its classification history.
id RCAP_6f7fc6a4d6c306238bae3bdb324870e4
oai_identifier_str oai:repositorio-aberto.up.pt:10216/85701
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Malware classification on time series data through machine learningEngenharia electrotécnica, electrónica e informáticaElectrical engineering, Electronic engineering, Information engineeringMalware classification can be a challenge considering the great amount of variety and increasing emergence of malware, as well as, available classification methods. For this reason, it is not unusual for a file to be considered a different type of malicious file by different classifiers. In fact, an assignment made by a single classifier might change through time, as a consequence of methods refinements or new discoveries. When using multiple independent classifiers, past classifications of a certain file might help on deciding on which one to trust. This dissertation aims at finding a way to facilitate this analysis by collecting historical data on files that already have assigned their final and last classification, and determine which machine learning algorithm can better predict a new file classification given this very same data. Besides the historical data, other characteristics shall be taken into account like: source of the file, filetype and filesize. The machine learning algorithms we have used are: C4.5, Random Forests, Multi-Layer Perceptron (MLP) and Long short-term memory (LSTM). It was possible with this approach to find an alternative way in finding the correct malware classification of a file, given a multiple number of classifiers, taking into account its classification history.2016-07-122016-07-12T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://hdl.handle.net/10216/85701TID:201301636engDiogo Moutinho de Almeidainfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-29T13:03:46Zoai:repositorio-aberto.up.pt:10216/85701Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T23:32:52.909266Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Malware classification on time series data through machine learning
title Malware classification on time series data through machine learning
spellingShingle Malware classification on time series data through machine learning
Diogo Moutinho de Almeida
Engenharia electrotécnica, electrónica e informática
Electrical engineering, Electronic engineering, Information engineering
title_short Malware classification on time series data through machine learning
title_full Malware classification on time series data through machine learning
title_fullStr Malware classification on time series data through machine learning
title_full_unstemmed Malware classification on time series data through machine learning
title_sort Malware classification on time series data through machine learning
author Diogo Moutinho de Almeida
author_facet Diogo Moutinho de Almeida
author_role author
dc.contributor.author.fl_str_mv Diogo Moutinho de Almeida
dc.subject.por.fl_str_mv Engenharia electrotécnica, electrónica e informática
Electrical engineering, Electronic engineering, Information engineering
topic Engenharia electrotécnica, electrónica e informática
Electrical engineering, Electronic engineering, Information engineering
description Malware classification can be a challenge considering the great amount of variety and increasing emergence of malware, as well as, available classification methods. For this reason, it is not unusual for a file to be considered a different type of malicious file by different classifiers. In fact, an assignment made by a single classifier might change through time, as a consequence of methods refinements or new discoveries. When using multiple independent classifiers, past classifications of a certain file might help on deciding on which one to trust. This dissertation aims at finding a way to facilitate this analysis by collecting historical data on files that already have assigned their final and last classification, and determine which machine learning algorithm can better predict a new file classification given this very same data. Besides the historical data, other characteristics shall be taken into account like: source of the file, filetype and filesize. The machine learning algorithms we have used are: C4.5, Random Forests, Multi-Layer Perceptron (MLP) and Long short-term memory (LSTM). It was possible with this approach to find an alternative way in finding the correct malware classification of a file, given a multiple number of classifiers, taking into account its classification history.
publishDate 2016
dc.date.none.fl_str_mv 2016-07-12
2016-07-12T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://hdl.handle.net/10216/85701
TID:201301636
url https://hdl.handle.net/10216/85701
identifier_str_mv TID:201301636
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799135639310958593