Traffic classification based on statistical tests for matching empirical distributions of lengths of IP packets
Autor(a) principal: | |
---|---|
Data de Publicação: | 2013 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10400.6/3885 |
Resumo: | Nowadays, traffic classification constitutes one of the most important resources in the task of managing computer networks. The tools and techniques that enable network traffic to be segregated into classes are critical for administrators to maintain their networks operating at the required Quality of Service (QoS) and security levels. Nonetheless, the steady evolution of the infrastructure and mainly of the terminal devices, as well as the consequent increase of the complexity of the networks, make this task a lot harder to achieve, both in terms of accuracy and computational requirements. Some of the factors that most prejudice traffic classification are the adoption of encryption and evasive techniques, employed by network applications. Several researchers have thus been focusing efforts in finding new means to classify traffic or improve the existing ones. This dissertation discusses a research work on the network traffic classification subject, focused on the segregation of network flows according to the application that generated them, independently of the fact that such applications use different communication paradigms. For achieving that purpose, a network scenario similar to a real one was setup on a lab environment, and several traffic traces generated using different contemporary applications were collected. This traces were initially subject to human analysis, which enabled the identification of behavior patterns without resorting to information inside the contents of the packets, using only the empirical distribution of the size of the packets. After the initial analysis, a set of signatures composed by the aforementioned empirical distributions and the name of respective applications was build, for each one of the applications and type of traffic under analysis. Subsequently, the best means to obtain the correspondence between the signatures and the network traffic in real-time and in a packet-by-packet manner was investigated, from which resulted the modification of two statistical tests known as Chi- Squared and Kolmogorov-Smirnov, later implemented in prototypes for traffic classification. To enable the packet-by-packet analysis, the two statistics of the aforementioned tests are calculated for a sliding window of values, which iterates each time a new packet of the flow arrives. The number of operations involved in the actualization of the statistics is constant and low, which enables obtaining a classification at any given moment of the duration of a flow. Each one of the two classification methods was implemented in a different prototype and then combined, using an heuristic, to obtain a third classifier. The classifiers were tested and evaluated separately resorting to new traffic traces, generated by the different applications considered in the study, captured in a network aggregation point. Even though the results obtained for each one of the two classifiers were good, presenting an accuracy above 70%, the combination of the two methods improves those results, correctly classifying more than 90% of the analysed flows. Additionally, the developed prototypes were compared with other similar tools discussed on the related literature and available online, and it was verified that, in many cases, the proposed classifiers produce better results for the analysed traces. |
id |
RCAP_d079aa930e8d4f0acd362e04a9088314 |
---|---|
oai_identifier_str |
oai:ubibliorum.ubi.pt:10400.6/3885 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Traffic classification based on statistical tests for matching empirical distributions of lengths of IP packetsTráfego de rede - ClassificaçãoTráfego no escuro - ClassificaçãoTráfego de rede - MonitorizaçãoTeste Chi-Quadrado - EstatísticaTeste Kolmogorov-Smirnov - EstatísticaDomínio/Área Científica:Engenharia e TecnologiaNowadays, traffic classification constitutes one of the most important resources in the task of managing computer networks. The tools and techniques that enable network traffic to be segregated into classes are critical for administrators to maintain their networks operating at the required Quality of Service (QoS) and security levels. Nonetheless, the steady evolution of the infrastructure and mainly of the terminal devices, as well as the consequent increase of the complexity of the networks, make this task a lot harder to achieve, both in terms of accuracy and computational requirements. Some of the factors that most prejudice traffic classification are the adoption of encryption and evasive techniques, employed by network applications. Several researchers have thus been focusing efforts in finding new means to classify traffic or improve the existing ones. This dissertation discusses a research work on the network traffic classification subject, focused on the segregation of network flows according to the application that generated them, independently of the fact that such applications use different communication paradigms. For achieving that purpose, a network scenario similar to a real one was setup on a lab environment, and several traffic traces generated using different contemporary applications were collected. This traces were initially subject to human analysis, which enabled the identification of behavior patterns without resorting to information inside the contents of the packets, using only the empirical distribution of the size of the packets. After the initial analysis, a set of signatures composed by the aforementioned empirical distributions and the name of respective applications was build, for each one of the applications and type of traffic under analysis. Subsequently, the best means to obtain the correspondence between the signatures and the network traffic in real-time and in a packet-by-packet manner was investigated, from which resulted the modification of two statistical tests known as Chi- Squared and Kolmogorov-Smirnov, later implemented in prototypes for traffic classification. To enable the packet-by-packet analysis, the two statistics of the aforementioned tests are calculated for a sliding window of values, which iterates each time a new packet of the flow arrives. The number of operations involved in the actualization of the statistics is constant and low, which enables obtaining a classification at any given moment of the duration of a flow. Each one of the two classification methods was implemented in a different prototype and then combined, using an heuristic, to obtain a third classifier. The classifiers were tested and evaluated separately resorting to new traffic traces, generated by the different applications considered in the study, captured in a network aggregation point. Even though the results obtained for each one of the two classifiers were good, presenting an accuracy above 70%, the combination of the two methods improves those results, correctly classifying more than 90% of the analysed flows. Additionally, the developed prototypes were compared with other similar tools discussed on the related literature and available online, and it was verified that, in many cases, the proposed classifiers produce better results for the analysed traces.Inácio, Pedro Ricardo MoraisuBibliorumNeto, Miguel Ângelo Silva2015-10-29T15:05:07Z2013-062013-06-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10400.6/3885enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-12-15T09:40:29Zoai:ubibliorum.ubi.pt:10400.6/3885Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T00:45:12.687673Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Traffic classification based on statistical tests for matching empirical distributions of lengths of IP packets |
title |
Traffic classification based on statistical tests for matching empirical distributions of lengths of IP packets |
spellingShingle |
Traffic classification based on statistical tests for matching empirical distributions of lengths of IP packets Neto, Miguel Ângelo Silva Tráfego de rede - Classificação Tráfego no escuro - Classificação Tráfego de rede - Monitorização Teste Chi-Quadrado - Estatística Teste Kolmogorov-Smirnov - Estatística Domínio/Área Científica:Engenharia e Tecnologia |
title_short |
Traffic classification based on statistical tests for matching empirical distributions of lengths of IP packets |
title_full |
Traffic classification based on statistical tests for matching empirical distributions of lengths of IP packets |
title_fullStr |
Traffic classification based on statistical tests for matching empirical distributions of lengths of IP packets |
title_full_unstemmed |
Traffic classification based on statistical tests for matching empirical distributions of lengths of IP packets |
title_sort |
Traffic classification based on statistical tests for matching empirical distributions of lengths of IP packets |
author |
Neto, Miguel Ângelo Silva |
author_facet |
Neto, Miguel Ângelo Silva |
author_role |
author |
dc.contributor.none.fl_str_mv |
Inácio, Pedro Ricardo Morais uBibliorum |
dc.contributor.author.fl_str_mv |
Neto, Miguel Ângelo Silva |
dc.subject.por.fl_str_mv |
Tráfego de rede - Classificação Tráfego no escuro - Classificação Tráfego de rede - Monitorização Teste Chi-Quadrado - Estatística Teste Kolmogorov-Smirnov - Estatística Domínio/Área Científica:Engenharia e Tecnologia |
topic |
Tráfego de rede - Classificação Tráfego no escuro - Classificação Tráfego de rede - Monitorização Teste Chi-Quadrado - Estatística Teste Kolmogorov-Smirnov - Estatística Domínio/Área Científica:Engenharia e Tecnologia |
description |
Nowadays, traffic classification constitutes one of the most important resources in the task of managing computer networks. The tools and techniques that enable network traffic to be segregated into classes are critical for administrators to maintain their networks operating at the required Quality of Service (QoS) and security levels. Nonetheless, the steady evolution of the infrastructure and mainly of the terminal devices, as well as the consequent increase of the complexity of the networks, make this task a lot harder to achieve, both in terms of accuracy and computational requirements. Some of the factors that most prejudice traffic classification are the adoption of encryption and evasive techniques, employed by network applications. Several researchers have thus been focusing efforts in finding new means to classify traffic or improve the existing ones. This dissertation discusses a research work on the network traffic classification subject, focused on the segregation of network flows according to the application that generated them, independently of the fact that such applications use different communication paradigms. For achieving that purpose, a network scenario similar to a real one was setup on a lab environment, and several traffic traces generated using different contemporary applications were collected. This traces were initially subject to human analysis, which enabled the identification of behavior patterns without resorting to information inside the contents of the packets, using only the empirical distribution of the size of the packets. After the initial analysis, a set of signatures composed by the aforementioned empirical distributions and the name of respective applications was build, for each one of the applications and type of traffic under analysis. Subsequently, the best means to obtain the correspondence between the signatures and the network traffic in real-time and in a packet-by-packet manner was investigated, from which resulted the modification of two statistical tests known as Chi- Squared and Kolmogorov-Smirnov, later implemented in prototypes for traffic classification. To enable the packet-by-packet analysis, the two statistics of the aforementioned tests are calculated for a sliding window of values, which iterates each time a new packet of the flow arrives. The number of operations involved in the actualization of the statistics is constant and low, which enables obtaining a classification at any given moment of the duration of a flow. Each one of the two classification methods was implemented in a different prototype and then combined, using an heuristic, to obtain a third classifier. The classifiers were tested and evaluated separately resorting to new traffic traces, generated by the different applications considered in the study, captured in a network aggregation point. Even though the results obtained for each one of the two classifiers were good, presenting an accuracy above 70%, the combination of the two methods improves those results, correctly classifying more than 90% of the analysed flows. Additionally, the developed prototypes were compared with other similar tools discussed on the related literature and available online, and it was verified that, in many cases, the proposed classifiers produce better results for the analysed traces. |
publishDate |
2013 |
dc.date.none.fl_str_mv |
2013-06 2013-06-01T00:00:00Z 2015-10-29T15:05:07Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10400.6/3885 |
url |
http://hdl.handle.net/10400.6/3885 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799136348308766720 |