Telecom Churn Prediction: An approach Towards Big Data
Autor(a) principal: | |
---|---|
Data de Publicação: | 2022 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10362/160464 |
Resumo: | Churn prediction is a crucial subject in telecom companies. Acquiring a new customer is more expensive than retaining a customer. Identifying such customers requires the address of multiple challenges. The first is caused by telecom datasets. These tend to be high-dimensional and at the same time very sparse, bringing multicollinearity and overfitting issues. Another challenge concerns variable data types. There are static variables and dynamic variables over time. The nature of the use case creates another adversity. The goal is to predict who is leaving the service, but, in any successful company, there are much more clients staying than leaving. This creates what an unbalanced dataset, where the binary target variable has an unbalanced distribution between its’ classes. In this work, a pipeline is proposed targeting the telecom industry. This pipeline aims to address the churn problem, i.e., to identify the clients that have a high propensity to leave the service. The pipeline is designed to deal with the multiple challenges identified and to be adaptable to other telecom datasets. This pipeline is composed of multiple steps, the first step was to restructure data, this was done by realigning all clients by its last month, in an active state, stored in the system. The multiple observations per client were compressed into one using statistics like median and standard deviation, after that feature selection method was applied but multiple options were considered and evaluated at the end of this document. Models were then used to predict variable. These models were adapted to handle unbalance challenge. This work demonstrated the ability to achieve reasonable results using a restructuring proccess and compressing statistics. This work also demonstrated the ability to achieve reasonable good results using a feature selection algorithm. |
id |
RCAP_29af1b800a28360a9af70ae5acae1a4d |
---|---|
oai_identifier_str |
oai:run.unl.pt:10362/160464 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Telecom Churn Prediction: An approach Towards Big DataChurnClassificationModelingMachine learningTelecomDomínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e InformáticaChurn prediction is a crucial subject in telecom companies. Acquiring a new customer is more expensive than retaining a customer. Identifying such customers requires the address of multiple challenges. The first is caused by telecom datasets. These tend to be high-dimensional and at the same time very sparse, bringing multicollinearity and overfitting issues. Another challenge concerns variable data types. There are static variables and dynamic variables over time. The nature of the use case creates another adversity. The goal is to predict who is leaving the service, but, in any successful company, there are much more clients staying than leaving. This creates what an unbalanced dataset, where the binary target variable has an unbalanced distribution between its’ classes. In this work, a pipeline is proposed targeting the telecom industry. This pipeline aims to address the churn problem, i.e., to identify the clients that have a high propensity to leave the service. The pipeline is designed to deal with the multiple challenges identified and to be adaptable to other telecom datasets. This pipeline is composed of multiple steps, the first step was to restructure data, this was done by realigning all clients by its last month, in an active state, stored in the system. The multiple observations per client were compressed into one using statistics like median and standard deviation, after that feature selection method was applied but multiple options were considered and evaluated at the end of this document. Models were then used to predict variable. These models were adapted to handle unbalance challenge. This work demonstrated the ability to achieve reasonable results using a restructuring proccess and compressing statistics. This work also demonstrated the ability to achieve reasonable good results using a feature selection algorithm.A previsão de churn é crucial nas empresas de telecomunicações. Adquirir um novo cliente é mais dispendioso do que reter um cliente. A identificação de tais clientes requer a abordagem de múltiplos desafios. O primeiro deve-se aos dataset de telecomunicações. Estes tendem a ter uma dimensionalidade elevada sendo, no entanto, esparsos. Isto traz problemas de multicolinearidade e de churn. Outro desafio diz respeito aos tipos de dados. Os dados presentes nestes dataset por norma ou são estáticos ou dinâmicos ao longo do tempo. A natureza do estudo caso em si cria outra adversidade. O objectivo é prever clientes que vão deixar o serviço, mas, em qualquer empresa bem sucedida, existem consideravel- mente mais clientes que ficam do que os que desistem. Isto queria um desequilíbrio na variável objectiva, tendo esta uma distribuição desequilibrada entre classes. Neste trabalho, é proposto uma pipeline à luz da indústria telecom. Esta pipeline visa identificar clientes que com uma elevada propensão para desistir de um determinado serviço. A pipeline foi concebida para lidar com os desafios apresentados e sendo adaptável a outros datasets de telecomincações. O primeiro passo da pipeline foi reestruturar os dados. Realinhou-se todos os dados dos clientes pelo o seu último mês activo existenete no sistema. As múltiplas observações por cliente foram comprimidas numa só usando estatísticas como a mediana e o desvio padrão. Depois, foram aplicados metodos de selecção de variáveis, no entanto foram consideradas e avaliados múltiplos cenários no final deste documento. Por fim a variável objetivo foi modelada usando os múltiplos scenários sendo que modelos usados foram adaptados para lidar com o desequilíbrio da variável objetivo. Este trabalho demonstrou resultados razoáveis ao utilizar um processo de reestruturação e estatísticas de compressão. No mesmo trabalho foram de alcançados bons resultados razoáveis, filtrando algumas variáveis usando um algoritmo de seleção de variáveis.Lopes, MartaLourenço, JoãoRUNCoelho, António Fonseca2023-11-24T18:55:02Z2022-032022-03-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/160464enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T05:43:11Zoai:run.unl.pt:10362/160464Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:58:04.119455Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Telecom Churn Prediction: An approach Towards Big Data |
title |
Telecom Churn Prediction: An approach Towards Big Data |
spellingShingle |
Telecom Churn Prediction: An approach Towards Big Data Coelho, António Fonseca Churn Classification Modeling Machine learning Telecom Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática |
title_short |
Telecom Churn Prediction: An approach Towards Big Data |
title_full |
Telecom Churn Prediction: An approach Towards Big Data |
title_fullStr |
Telecom Churn Prediction: An approach Towards Big Data |
title_full_unstemmed |
Telecom Churn Prediction: An approach Towards Big Data |
title_sort |
Telecom Churn Prediction: An approach Towards Big Data |
author |
Coelho, António Fonseca |
author_facet |
Coelho, António Fonseca |
author_role |
author |
dc.contributor.none.fl_str_mv |
Lopes, Marta Lourenço, João RUN |
dc.contributor.author.fl_str_mv |
Coelho, António Fonseca |
dc.subject.por.fl_str_mv |
Churn Classification Modeling Machine learning Telecom Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática |
topic |
Churn Classification Modeling Machine learning Telecom Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática |
description |
Churn prediction is a crucial subject in telecom companies. Acquiring a new customer is more expensive than retaining a customer. Identifying such customers requires the address of multiple challenges. The first is caused by telecom datasets. These tend to be high-dimensional and at the same time very sparse, bringing multicollinearity and overfitting issues. Another challenge concerns variable data types. There are static variables and dynamic variables over time. The nature of the use case creates another adversity. The goal is to predict who is leaving the service, but, in any successful company, there are much more clients staying than leaving. This creates what an unbalanced dataset, where the binary target variable has an unbalanced distribution between its’ classes. In this work, a pipeline is proposed targeting the telecom industry. This pipeline aims to address the churn problem, i.e., to identify the clients that have a high propensity to leave the service. The pipeline is designed to deal with the multiple challenges identified and to be adaptable to other telecom datasets. This pipeline is composed of multiple steps, the first step was to restructure data, this was done by realigning all clients by its last month, in an active state, stored in the system. The multiple observations per client were compressed into one using statistics like median and standard deviation, after that feature selection method was applied but multiple options were considered and evaluated at the end of this document. Models were then used to predict variable. These models were adapted to handle unbalance challenge. This work demonstrated the ability to achieve reasonable results using a restructuring proccess and compressing statistics. This work also demonstrated the ability to achieve reasonable good results using a feature selection algorithm. |
publishDate |
2022 |
dc.date.none.fl_str_mv |
2022-03 2022-03-01T00:00:00Z 2023-11-24T18:55:02Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10362/160464 |
url |
http://hdl.handle.net/10362/160464 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799138162346295296 |