Malware classification on time series data through machine learning

Diogo Moutinho de Almeida

Malware classification on time series data through machine learning

Detalhes bibliográficos
Autor(a) principal:	Diogo Moutinho de Almeida
Data de Publicação:	2016
Tipo de documento:	Dissertação
Idioma:	eng
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	https://hdl.handle.net/10216/85701
Resumo:	Malware classification can be a challenge considering the great amount of variety and increasing emergence of malware, as well as, available classification methods. For this reason, it is not unusual for a file to be considered a different type of malicious file by different classifiers. In fact, an assignment made by a single classifier might change through time, as a consequence of methods refinements or new discoveries. When using multiple independent classifiers, past classifications of a certain file might help on deciding on which one to trust. This dissertation aims at finding a way to facilitate this analysis by collecting historical data on files that already have assigned their final and last classification, and determine which machine learning algorithm can better predict a new file classification given this very same data. Besides the historical data, other characteristics shall be taken into account like: source of the file, filetype and filesize. The machine learning algorithms we have used are: C4.5, Random Forests, Multi-Layer Perceptron (MLP) and Long short-term memory (LSTM). It was possible with this approach to find an alternative way in finding the correct malware classification of a file, given a multiple number of classifiers, taking into account its classification history.

Metadados do item

id	RCAP_6f7fc6a4d6c306238bae3bdb324870e4
oai_identifier_str	oai:repositorio-aberto.up.pt:10216/85701
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	Malware classification on time series data through machine learningEngenharia electrotécnica, electrónica e informáticaElectrical engineering, Electronic engineering, Information engineeringMalware classification can be a challenge considering the great amount of variety and increasing emergence of malware, as well as, available classification methods. For this reason, it is not unusual for a file to be considered a different type of malicious file by different classifiers. In fact, an assignment made by a single classifier might change through time, as a consequence of methods refinements or new discoveries. When using multiple independent classifiers, past classifications of a certain file might help on deciding on which one to trust. This dissertation aims at finding a way to facilitate this analysis by collecting historical data on files that already have assigned their final and last classification, and determine which machine learning algorithm can better predict a new file classification given this very same data. Besides the historical data, other characteristics shall be taken into account like: source of the file, filetype and filesize. The machine learning algorithms we have used are: C4.5, Random Forests, Multi-Layer Perceptron (MLP) and Long short-term memory (LSTM). It was possible with this approach to find an alternative way in finding the correct malware classification of a file, given a multiple number of classifiers, taking into account its classification history.2016-07-122016-07-12T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://hdl.handle.net/10216/85701TID:201301636engDiogo Moutinho de Almeidainfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-29T13:03:46Zoai:repositorio-aberto.up.pt:10216/85701Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T23:32:52.909266Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	Malware classification on time series data through machine learning
title	Malware classification on time series data through machine learning
spellingShingle	Malware classification on time series data through machine learning Diogo Moutinho de Almeida Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering
title_short	Malware classification on time series data through machine learning
title_full	Malware classification on time series data through machine learning
title_fullStr	Malware classification on time series data through machine learning
title_full_unstemmed	Malware classification on time series data through machine learning
title_sort	Malware classification on time series data through machine learning
author	Diogo Moutinho de Almeida
author_facet	Diogo Moutinho de Almeida
author_role	author
dc.contributor.author.fl_str_mv	Diogo Moutinho de Almeida
dc.subject.por.fl_str_mv	Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering
topic	Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering
description	Malware classification can be a challenge considering the great amount of variety and increasing emergence of malware, as well as, available classification methods. For this reason, it is not unusual for a file to be considered a different type of malicious file by different classifiers. In fact, an assignment made by a single classifier might change through time, as a consequence of methods refinements or new discoveries. When using multiple independent classifiers, past classifications of a certain file might help on deciding on which one to trust. This dissertation aims at finding a way to facilitate this analysis by collecting historical data on files that already have assigned their final and last classification, and determine which machine learning algorithm can better predict a new file classification given this very same data. Besides the historical data, other characteristics shall be taken into account like: source of the file, filetype and filesize. The machine learning algorithms we have used are: C4.5, Random Forests, Multi-Layer Perceptron (MLP) and Long short-term memory (LSTM). It was possible with this approach to find an alternative way in finding the correct malware classification of a file, given a multiple number of classifiers, taking into account its classification history.
publishDate	2016
dc.date.none.fl_str_mv	2016-07-12 2016-07-12T00:00:00Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://hdl.handle.net/10216/85701 TID:201301636
url	https://hdl.handle.net/10216/85701
identifier_str_mv	TID:201301636
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799135639310958593

Malware classification on time series data through machine learning

Registros relacionados