Comparison of anomaly detection techniques applied to different problems in the telecom industry

Rechena, Pedro Miguel David

Comparison of anomaly detection techniques applied to different problems in the telecom industry

Detalhes bibliográficos
Autor(a) principal:	Rechena, Pedro Miguel David
Data de Publicação:	2021
Tipo de documento:	Dissertação
Idioma:	eng
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	http://hdl.handle.net/10362/127796
Resumo:	Nowadays, with the growth of digital transformation in companies, a huge amount of data is generated every second as a result of various processes. Often this data contains important information which, when properly analyzed, can help a company gain a competitive advantage. One data processing task common to many different applications is detection of anomalies, that is, data points or groups of data points that stand out from most of the others. Since it is not feasible to have an operator constantly analyzing the data to find anomalous values, due to the generally large volumes of data, the focus of this dissertation is the exploration of a Data Mining area called anomaly detection. In this dissertation we first develop an anomaly detection software in Python, that applies 10 different anomaly detection algorithms, after automatically optimizing their parameters, to an arbitrary dataset. Before applying these algorithms, the software also performs the task of data scaling and imputation of missing values. It outputs the results of the performance metrics of each algorithm, the values of the optimized parameters and the graphics for the results visualization generated using the method t-SNE. This software was then applied to three case studies to compare the performance of different anomaly detection approaches using real-world datasets. These datasets have an increasing level of difficulty associated with them: the amount of missing data and the uncertainty associated with the ground truth regarding the anomalies. In the first case study, we detected fraudulent bank transactions using a public dataset. Then, in the second case we identified clients of a telecommunication company who were likely to miss their payment, leading to contract termination. For this case we used a dataset from a telecommunications company. In the third case, we detected low quality of internet service, again using a large dataset with real measurements from a telecommunications company. Finally, we implemented a state of the art, neural network model, specially applicable to the task of identifying anomalies in time-series data. We optimized the parameters of the network, and applied it to address the problem of low quality of service.

Metadados do item

id	RCAP_8b38625a51fbc4775b175f59af418d89
oai_identifier_str	oai:run.unl.pt:10362/127796
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	Comparison of anomaly detection techniques applied to different problems in the telecom industryAnomaly DetectionMachine learningUnsupervised LearningTime seriesLSTMDomínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e InformáticaNowadays, with the growth of digital transformation in companies, a huge amount of data is generated every second as a result of various processes. Often this data contains important information which, when properly analyzed, can help a company gain a competitive advantage. One data processing task common to many different applications is detection of anomalies, that is, data points or groups of data points that stand out from most of the others. Since it is not feasible to have an operator constantly analyzing the data to find anomalous values, due to the generally large volumes of data, the focus of this dissertation is the exploration of a Data Mining area called anomaly detection. In this dissertation we first develop an anomaly detection software in Python, that applies 10 different anomaly detection algorithms, after automatically optimizing their parameters, to an arbitrary dataset. Before applying these algorithms, the software also performs the task of data scaling and imputation of missing values. It outputs the results of the performance metrics of each algorithm, the values of the optimized parameters and the graphics for the results visualization generated using the method t-SNE. This software was then applied to three case studies to compare the performance of different anomaly detection approaches using real-world datasets. These datasets have an increasing level of difficulty associated with them: the amount of missing data and the uncertainty associated with the ground truth regarding the anomalies. In the first case study, we detected fraudulent bank transactions using a public dataset. Then, in the second case we identified clients of a telecommunication company who were likely to miss their payment, leading to contract termination. For this case we used a dataset from a telecommunications company. In the third case, we detected low quality of internet service, again using a large dataset with real measurements from a telecommunications company. Finally, we implemented a state of the art, neural network model, specially applicable to the task of identifying anomalies in time-series data. We optimized the parameters of the network, and applied it to address the problem of low quality of service.Com o crescimento da transformação digital nas empresas, uma quantidade enorme de dados são gerados a cada segundo como consequência de variados processos. Muitas das vezes esses dados contêm informação importante que podem permitir a uma determinada empresa obter uma vantagem competitiva. Uma forma de obter conhecimento sobre o actual funcionamento de um determinado processo é através da detecção de anomalias, ou seja, instâncias de dados que se destacam da maioria das restantes. Visto não ser viável ter um operador a visualizar linhas de dados para encontrar anomalias, devido às dimensões dos dados, o foco desta dissertação revolve em torno da exploração de uma área de Data Mining chamada detecção de anomalias. Nesta dissertação propõe-se em primeiro lugar um software de detecção de anomalias feito em Python que aplica um conjunto de 10 algoritmos de detecção de anomalias, depois de optimizar os seus parâmetros automaticamente, a um conjunto de dados arbitrários. Antes da aplicação dos algoritmos, o software realiza primeiramente a sua normalização e a imputação dos valores nulos. Por fim, retorna os resultados das métricas de desempenho de cada algoritmo, os parâmetros escolhidos e um conjunto de gráficos para visualização de resultados, gerados utilizando t-SNE. Este software foi então aplicado a três casos de estudo para comparar o desempenho das diferentes técnicas utilizando conjuntos de dados reais. Estes conjuntos de dados têm um nível crescente de dificuldade associado a eles: a quantidade de valores nulos e a incerteza em relação aos pontos realmente anómalos. O primeiro é relacionado com transacções bancárias onde se utilizou um conjunto de dados público. Depois, um caso de estudo relacionado com cessações de contrato devido à falta de pagamento, onde foi utilizado um conjunto de dados de uma empresa de telecomunicações. Por último um caso de estudo relacionado com a qualidade de serviço de clientes de uma empresa de telecomunicações. Por fim, foi implementada uma arquitectura de um modelo de redes neuronais avançado de detecção de anomalias em séries temporais, que foi utilizado para detectar anomalias no conjunto de dados de qualidade de serviço.Bernardo, LuísZejnilovi´c, SabinaRUNRechena, Pedro Miguel David2021-11-16T17:08:01Z2021-022021-02-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/127796enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T05:07:32Zoai:run.unl.pt:10362/127796Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:46:09.906531Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	Comparison of anomaly detection techniques applied to different problems in the telecom industry
title	Comparison of anomaly detection techniques applied to different problems in the telecom industry
spellingShingle	Comparison of anomaly detection techniques applied to different problems in the telecom industry Rechena, Pedro Miguel David Anomaly Detection Machine learning Unsupervised Learning Time series LSTM Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
title_short	Comparison of anomaly detection techniques applied to different problems in the telecom industry
title_full	Comparison of anomaly detection techniques applied to different problems in the telecom industry
title_fullStr	Comparison of anomaly detection techniques applied to different problems in the telecom industry
title_full_unstemmed	Comparison of anomaly detection techniques applied to different problems in the telecom industry
title_sort	Comparison of anomaly detection techniques applied to different problems in the telecom industry
author	Rechena, Pedro Miguel David
author_facet	Rechena, Pedro Miguel David
author_role	author
dc.contributor.none.fl_str_mv	Bernardo, Luís Zejnilovi´c, Sabina RUN
dc.contributor.author.fl_str_mv	Rechena, Pedro Miguel David
dc.subject.por.fl_str_mv	Anomaly Detection Machine learning Unsupervised Learning Time series LSTM Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
topic	Anomaly Detection Machine learning Unsupervised Learning Time series LSTM Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
description	Nowadays, with the growth of digital transformation in companies, a huge amount of data is generated every second as a result of various processes. Often this data contains important information which, when properly analyzed, can help a company gain a competitive advantage. One data processing task common to many different applications is detection of anomalies, that is, data points or groups of data points that stand out from most of the others. Since it is not feasible to have an operator constantly analyzing the data to find anomalous values, due to the generally large volumes of data, the focus of this dissertation is the exploration of a Data Mining area called anomaly detection. In this dissertation we first develop an anomaly detection software in Python, that applies 10 different anomaly detection algorithms, after automatically optimizing their parameters, to an arbitrary dataset. Before applying these algorithms, the software also performs the task of data scaling and imputation of missing values. It outputs the results of the performance metrics of each algorithm, the values of the optimized parameters and the graphics for the results visualization generated using the method t-SNE. This software was then applied to three case studies to compare the performance of different anomaly detection approaches using real-world datasets. These datasets have an increasing level of difficulty associated with them: the amount of missing data and the uncertainty associated with the ground truth regarding the anomalies. In the first case study, we detected fraudulent bank transactions using a public dataset. Then, in the second case we identified clients of a telecommunication company who were likely to miss their payment, leading to contract termination. For this case we used a dataset from a telecommunications company. In the third case, we detected low quality of internet service, again using a large dataset with real measurements from a telecommunications company. Finally, we implemented a state of the art, neural network model, specially applicable to the task of identifying anomalies in time-series data. We optimized the parameters of the network, and applied it to address the problem of low quality of service.
publishDate	2021
dc.date.none.fl_str_mv	2021-11-16T17:08:01Z 2021-02 2021-02-01T00:00:00Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10362/127796
url	http://hdl.handle.net/10362/127796
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799138065433755648

Comparison of anomaly detection techniques applied to different problems in the telecom industry

Registros relacionados