Traffic classification based on statistical tests for matching empirical distributions of lengths of IP packets

Detalhes bibliográficos
Autor(a) principal: Neto, Miguel Ângelo Silva
Data de Publicação: 2013
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10400.6/3885
Resumo: Nowadays, traffic classification constitutes one of the most important resources in the task of managing computer networks. The tools and techniques that enable network traffic to be segregated into classes are critical for administrators to maintain their networks operating at the required Quality of Service (QoS) and security levels. Nonetheless, the steady evolution of the infrastructure and mainly of the terminal devices, as well as the consequent increase of the complexity of the networks, make this task a lot harder to achieve, both in terms of accuracy and computational requirements. Some of the factors that most prejudice traffic classification are the adoption of encryption and evasive techniques, employed by network applications. Several researchers have thus been focusing efforts in finding new means to classify traffic or improve the existing ones. This dissertation discusses a research work on the network traffic classification subject, focused on the segregation of network flows according to the application that generated them, independently of the fact that such applications use different communication paradigms. For achieving that purpose, a network scenario similar to a real one was setup on a lab environment, and several traffic traces generated using different contemporary applications were collected. This traces were initially subject to human analysis, which enabled the identification of behavior patterns without resorting to information inside the contents of the packets, using only the empirical distribution of the size of the packets. After the initial analysis, a set of signatures composed by the aforementioned empirical distributions and the name of respective applications was build, for each one of the applications and type of traffic under analysis. Subsequently, the best means to obtain the correspondence between the signatures and the network traffic in real-time and in a packet-by-packet manner was investigated, from which resulted the modification of two statistical tests known as Chi- Squared and Kolmogorov-Smirnov, later implemented in prototypes for traffic classification. To enable the packet-by-packet analysis, the two statistics of the aforementioned tests are calculated for a sliding window of values, which iterates each time a new packet of the flow arrives. The number of operations involved in the actualization of the statistics is constant and low, which enables obtaining a classification at any given moment of the duration of a flow. Each one of the two classification methods was implemented in a different prototype and then combined, using an heuristic, to obtain a third classifier. The classifiers were tested and evaluated separately resorting to new traffic traces, generated by the different applications considered in the study, captured in a network aggregation point. Even though the results obtained for each one of the two classifiers were good, presenting an accuracy above 70%, the combination of the two methods improves those results, correctly classifying more than 90% of the analysed flows. Additionally, the developed prototypes were compared with other similar tools discussed on the related literature and available online, and it was verified that, in many cases, the proposed classifiers produce better results for the analysed traces.
id RCAP_d079aa930e8d4f0acd362e04a9088314
oai_identifier_str oai:ubibliorum.ubi.pt:10400.6/3885
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Traffic classification based on statistical tests for matching empirical distributions of lengths of IP packetsTráfego de rede - ClassificaçãoTráfego no escuro - ClassificaçãoTráfego de rede - MonitorizaçãoTeste Chi-Quadrado - EstatísticaTeste Kolmogorov-Smirnov - EstatísticaDomínio/Área Científica:Engenharia e TecnologiaNowadays, traffic classification constitutes one of the most important resources in the task of managing computer networks. The tools and techniques that enable network traffic to be segregated into classes are critical for administrators to maintain their networks operating at the required Quality of Service (QoS) and security levels. Nonetheless, the steady evolution of the infrastructure and mainly of the terminal devices, as well as the consequent increase of the complexity of the networks, make this task a lot harder to achieve, both in terms of accuracy and computational requirements. Some of the factors that most prejudice traffic classification are the adoption of encryption and evasive techniques, employed by network applications. Several researchers have thus been focusing efforts in finding new means to classify traffic or improve the existing ones. This dissertation discusses a research work on the network traffic classification subject, focused on the segregation of network flows according to the application that generated them, independently of the fact that such applications use different communication paradigms. For achieving that purpose, a network scenario similar to a real one was setup on a lab environment, and several traffic traces generated using different contemporary applications were collected. This traces were initially subject to human analysis, which enabled the identification of behavior patterns without resorting to information inside the contents of the packets, using only the empirical distribution of the size of the packets. After the initial analysis, a set of signatures composed by the aforementioned empirical distributions and the name of respective applications was build, for each one of the applications and type of traffic under analysis. Subsequently, the best means to obtain the correspondence between the signatures and the network traffic in real-time and in a packet-by-packet manner was investigated, from which resulted the modification of two statistical tests known as Chi- Squared and Kolmogorov-Smirnov, later implemented in prototypes for traffic classification. To enable the packet-by-packet analysis, the two statistics of the aforementioned tests are calculated for a sliding window of values, which iterates each time a new packet of the flow arrives. The number of operations involved in the actualization of the statistics is constant and low, which enables obtaining a classification at any given moment of the duration of a flow. Each one of the two classification methods was implemented in a different prototype and then combined, using an heuristic, to obtain a third classifier. The classifiers were tested and evaluated separately resorting to new traffic traces, generated by the different applications considered in the study, captured in a network aggregation point. Even though the results obtained for each one of the two classifiers were good, presenting an accuracy above 70%, the combination of the two methods improves those results, correctly classifying more than 90% of the analysed flows. Additionally, the developed prototypes were compared with other similar tools discussed on the related literature and available online, and it was verified that, in many cases, the proposed classifiers produce better results for the analysed traces.Inácio, Pedro Ricardo MoraisuBibliorumNeto, Miguel Ângelo Silva2015-10-29T15:05:07Z2013-062013-06-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10400.6/3885enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-12-15T09:40:29Zoai:ubibliorum.ubi.pt:10400.6/3885Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T00:45:12.687673Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Traffic classification based on statistical tests for matching empirical distributions of lengths of IP packets
title Traffic classification based on statistical tests for matching empirical distributions of lengths of IP packets
spellingShingle Traffic classification based on statistical tests for matching empirical distributions of lengths of IP packets
Neto, Miguel Ângelo Silva
Tráfego de rede - Classificação
Tráfego no escuro - Classificação
Tráfego de rede - Monitorização
Teste Chi-Quadrado - Estatística
Teste Kolmogorov-Smirnov - Estatística
Domínio/Área Científica:Engenharia e Tecnologia
title_short Traffic classification based on statistical tests for matching empirical distributions of lengths of IP packets
title_full Traffic classification based on statistical tests for matching empirical distributions of lengths of IP packets
title_fullStr Traffic classification based on statistical tests for matching empirical distributions of lengths of IP packets
title_full_unstemmed Traffic classification based on statistical tests for matching empirical distributions of lengths of IP packets
title_sort Traffic classification based on statistical tests for matching empirical distributions of lengths of IP packets
author Neto, Miguel Ângelo Silva
author_facet Neto, Miguel Ângelo Silva
author_role author
dc.contributor.none.fl_str_mv Inácio, Pedro Ricardo Morais
uBibliorum
dc.contributor.author.fl_str_mv Neto, Miguel Ângelo Silva
dc.subject.por.fl_str_mv Tráfego de rede - Classificação
Tráfego no escuro - Classificação
Tráfego de rede - Monitorização
Teste Chi-Quadrado - Estatística
Teste Kolmogorov-Smirnov - Estatística
Domínio/Área Científica:Engenharia e Tecnologia
topic Tráfego de rede - Classificação
Tráfego no escuro - Classificação
Tráfego de rede - Monitorização
Teste Chi-Quadrado - Estatística
Teste Kolmogorov-Smirnov - Estatística
Domínio/Área Científica:Engenharia e Tecnologia
description Nowadays, traffic classification constitutes one of the most important resources in the task of managing computer networks. The tools and techniques that enable network traffic to be segregated into classes are critical for administrators to maintain their networks operating at the required Quality of Service (QoS) and security levels. Nonetheless, the steady evolution of the infrastructure and mainly of the terminal devices, as well as the consequent increase of the complexity of the networks, make this task a lot harder to achieve, both in terms of accuracy and computational requirements. Some of the factors that most prejudice traffic classification are the adoption of encryption and evasive techniques, employed by network applications. Several researchers have thus been focusing efforts in finding new means to classify traffic or improve the existing ones. This dissertation discusses a research work on the network traffic classification subject, focused on the segregation of network flows according to the application that generated them, independently of the fact that such applications use different communication paradigms. For achieving that purpose, a network scenario similar to a real one was setup on a lab environment, and several traffic traces generated using different contemporary applications were collected. This traces were initially subject to human analysis, which enabled the identification of behavior patterns without resorting to information inside the contents of the packets, using only the empirical distribution of the size of the packets. After the initial analysis, a set of signatures composed by the aforementioned empirical distributions and the name of respective applications was build, for each one of the applications and type of traffic under analysis. Subsequently, the best means to obtain the correspondence between the signatures and the network traffic in real-time and in a packet-by-packet manner was investigated, from which resulted the modification of two statistical tests known as Chi- Squared and Kolmogorov-Smirnov, later implemented in prototypes for traffic classification. To enable the packet-by-packet analysis, the two statistics of the aforementioned tests are calculated for a sliding window of values, which iterates each time a new packet of the flow arrives. The number of operations involved in the actualization of the statistics is constant and low, which enables obtaining a classification at any given moment of the duration of a flow. Each one of the two classification methods was implemented in a different prototype and then combined, using an heuristic, to obtain a third classifier. The classifiers were tested and evaluated separately resorting to new traffic traces, generated by the different applications considered in the study, captured in a network aggregation point. Even though the results obtained for each one of the two classifiers were good, presenting an accuracy above 70%, the combination of the two methods improves those results, correctly classifying more than 90% of the analysed flows. Additionally, the developed prototypes were compared with other similar tools discussed on the related literature and available online, and it was verified that, in many cases, the proposed classifiers produce better results for the analysed traces.
publishDate 2013
dc.date.none.fl_str_mv 2013-06
2013-06-01T00:00:00Z
2015-10-29T15:05:07Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10400.6/3885
url http://hdl.handle.net/10400.6/3885
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799136348308766720