Telecom Churn Prediction: An approach Towards Big Data

Coelho, António Fonseca

Telecom Churn Prediction: An approach Towards Big Data

Detalhes bibliográficos
Autor(a) principal:	Coelho, António Fonseca
Data de Publicação:	2022
Tipo de documento:	Dissertação
Idioma:	eng
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	http://hdl.handle.net/10362/160464
Resumo:	Churn prediction is a crucial subject in telecom companies. Acquiring a new customer is more expensive than retaining a customer. Identifying such customers requires the address of multiple challenges. The first is caused by telecom datasets. These tend to be high-dimensional and at the same time very sparse, bringing multicollinearity and overfitting issues. Another challenge concerns variable data types. There are static variables and dynamic variables over time. The nature of the use case creates another adversity. The goal is to predict who is leaving the service, but, in any successful company, there are much more clients staying than leaving. This creates what an unbalanced dataset, where the binary target variable has an unbalanced distribution between its’ classes. In this work, a pipeline is proposed targeting the telecom industry. This pipeline aims to address the churn problem, i.e., to identify the clients that have a high propensity to leave the service. The pipeline is designed to deal with the multiple challenges identified and to be adaptable to other telecom datasets. This pipeline is composed of multiple steps, the first step was to restructure data, this was done by realigning all clients by its last month, in an active state, stored in the system. The multiple observations per client were compressed into one using statistics like median and standard deviation, after that feature selection method was applied but multiple options were considered and evaluated at the end of this document. Models were then used to predict variable. These models were adapted to handle unbalance challenge. This work demonstrated the ability to achieve reasonable results using a restructuring proccess and compressing statistics. This work also demonstrated the ability to achieve reasonable good results using a feature selection algorithm.

Metadados do item

id	RCAP_29af1b800a28360a9af70ae5acae1a4d
oai_identifier_str	oai:run.unl.pt:10362/160464
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	Telecom Churn Prediction: An approach Towards Big DataChurnClassificationModelingMachine learningTelecomDomínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e InformáticaChurn prediction is a crucial subject in telecom companies. Acquiring a new customer is more expensive than retaining a customer. Identifying such customers requires the address of multiple challenges. The first is caused by telecom datasets. These tend to be high-dimensional and at the same time very sparse, bringing multicollinearity and overfitting issues. Another challenge concerns variable data types. There are static variables and dynamic variables over time. The nature of the use case creates another adversity. The goal is to predict who is leaving the service, but, in any successful company, there are much more clients staying than leaving. This creates what an unbalanced dataset, where the binary target variable has an unbalanced distribution between its’ classes. In this work, a pipeline is proposed targeting the telecom industry. This pipeline aims to address the churn problem, i.e., to identify the clients that have a high propensity to leave the service. The pipeline is designed to deal with the multiple challenges identified and to be adaptable to other telecom datasets. This pipeline is composed of multiple steps, the first step was to restructure data, this was done by realigning all clients by its last month, in an active state, stored in the system. The multiple observations per client were compressed into one using statistics like median and standard deviation, after that feature selection method was applied but multiple options were considered and evaluated at the end of this document. Models were then used to predict variable. These models were adapted to handle unbalance challenge. This work demonstrated the ability to achieve reasonable results using a restructuring proccess and compressing statistics. This work also demonstrated the ability to achieve reasonable good results using a feature selection algorithm.A previsão de churn é crucial nas empresas de telecomunicações. Adquirir um novo cliente é mais dispendioso do que reter um cliente. A identificação de tais clientes requer a abordagem de múltiplos desafios. O primeiro deve-se aos dataset de telecomunicações. Estes tendem a ter uma dimensionalidade elevada sendo, no entanto, esparsos. Isto traz problemas de multicolinearidade e de churn. Outro desafio diz respeito aos tipos de dados. Os dados presentes nestes dataset por norma ou são estáticos ou dinâmicos ao longo do tempo. A natureza do estudo caso em si cria outra adversidade. O objectivo é prever clientes que vão deixar o serviço, mas, em qualquer empresa bem sucedida, existem consideravel- mente mais clientes que ficam do que os que desistem. Isto queria um desequilíbrio na variável objectiva, tendo esta uma distribuição desequilibrada entre classes. Neste trabalho, é proposto uma pipeline à luz da indústria telecom. Esta pipeline visa identificar clientes que com uma elevada propensão para desistir de um determinado serviço. A pipeline foi concebida para lidar com os desafios apresentados e sendo adaptável a outros datasets de telecomincações. O primeiro passo da pipeline foi reestruturar os dados. Realinhou-se todos os dados dos clientes pelo o seu último mês activo existenete no sistema. As múltiplas observações por cliente foram comprimidas numa só usando estatísticas como a mediana e o desvio padrão. Depois, foram aplicados metodos de selecção de variáveis, no entanto foram consideradas e avaliados múltiplos cenários no final deste documento. Por fim a variável objetivo foi modelada usando os múltiplos scenários sendo que modelos usados foram adaptados para lidar com o desequilíbrio da variável objetivo. Este trabalho demonstrou resultados razoáveis ao utilizar um processo de reestruturação e estatísticas de compressão. No mesmo trabalho foram de alcançados bons resultados razoáveis, filtrando algumas variáveis usando um algoritmo de seleção de variáveis.Lopes, MartaLourenço, JoãoRUNCoelho, António Fonseca2023-11-24T18:55:02Z2022-032022-03-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/160464enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T05:43:11Zoai:run.unl.pt:10362/160464Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:58:04.119455Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	Telecom Churn Prediction: An approach Towards Big Data
title	Telecom Churn Prediction: An approach Towards Big Data
spellingShingle	Telecom Churn Prediction: An approach Towards Big Data Coelho, António Fonseca Churn Classification Modeling Machine learning Telecom Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
title_short	Telecom Churn Prediction: An approach Towards Big Data
title_full	Telecom Churn Prediction: An approach Towards Big Data
title_fullStr	Telecom Churn Prediction: An approach Towards Big Data
title_full_unstemmed	Telecom Churn Prediction: An approach Towards Big Data
title_sort	Telecom Churn Prediction: An approach Towards Big Data
author	Coelho, António Fonseca
author_facet	Coelho, António Fonseca
author_role	author
dc.contributor.none.fl_str_mv	Lopes, Marta Lourenço, João RUN
dc.contributor.author.fl_str_mv	Coelho, António Fonseca
dc.subject.por.fl_str_mv	Churn Classification Modeling Machine learning Telecom Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
topic	Churn Classification Modeling Machine learning Telecom Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
description	Churn prediction is a crucial subject in telecom companies. Acquiring a new customer is more expensive than retaining a customer. Identifying such customers requires the address of multiple challenges. The first is caused by telecom datasets. These tend to be high-dimensional and at the same time very sparse, bringing multicollinearity and overfitting issues. Another challenge concerns variable data types. There are static variables and dynamic variables over time. The nature of the use case creates another adversity. The goal is to predict who is leaving the service, but, in any successful company, there are much more clients staying than leaving. This creates what an unbalanced dataset, where the binary target variable has an unbalanced distribution between its’ classes. In this work, a pipeline is proposed targeting the telecom industry. This pipeline aims to address the churn problem, i.e., to identify the clients that have a high propensity to leave the service. The pipeline is designed to deal with the multiple challenges identified and to be adaptable to other telecom datasets. This pipeline is composed of multiple steps, the first step was to restructure data, this was done by realigning all clients by its last month, in an active state, stored in the system. The multiple observations per client were compressed into one using statistics like median and standard deviation, after that feature selection method was applied but multiple options were considered and evaluated at the end of this document. Models were then used to predict variable. These models were adapted to handle unbalance challenge. This work demonstrated the ability to achieve reasonable results using a restructuring proccess and compressing statistics. This work also demonstrated the ability to achieve reasonable good results using a feature selection algorithm.
publishDate	2022
dc.date.none.fl_str_mv	2022-03 2022-03-01T00:00:00Z 2023-11-24T18:55:02Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10362/160464
url	http://hdl.handle.net/10362/160464
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799138162346295296

Telecom Churn Prediction: An approach Towards Big Data

Registros relacionados