Telecom Churn Prediction: An approach Towards Big Data

Detalhes bibliográficos
Autor(a) principal: Coelho, António Fonseca
Data de Publicação: 2022
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10362/160464
Resumo: Churn prediction is a crucial subject in telecom companies. Acquiring a new customer is more expensive than retaining a customer. Identifying such customers requires the address of multiple challenges. The first is caused by telecom datasets. These tend to be high-dimensional and at the same time very sparse, bringing multicollinearity and overfitting issues. Another challenge concerns variable data types. There are static variables and dynamic variables over time. The nature of the use case creates another adversity. The goal is to predict who is leaving the service, but, in any successful company, there are much more clients staying than leaving. This creates what an unbalanced dataset, where the binary target variable has an unbalanced distribution between its’ classes. In this work, a pipeline is proposed targeting the telecom industry. This pipeline aims to address the churn problem, i.e., to identify the clients that have a high propensity to leave the service. The pipeline is designed to deal with the multiple challenges identified and to be adaptable to other telecom datasets. This pipeline is composed of multiple steps, the first step was to restructure data, this was done by realigning all clients by its last month, in an active state, stored in the system. The multiple observations per client were compressed into one using statistics like median and standard deviation, after that feature selection method was applied but multiple options were considered and evaluated at the end of this document. Models were then used to predict variable. These models were adapted to handle unbalance challenge. This work demonstrated the ability to achieve reasonable results using a restructuring proccess and compressing statistics. This work also demonstrated the ability to achieve reasonable good results using a feature selection algorithm.
id RCAP_29af1b800a28360a9af70ae5acae1a4d
oai_identifier_str oai:run.unl.pt:10362/160464
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Telecom Churn Prediction: An approach Towards Big DataChurnClassificationModelingMachine learningTelecomDomínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e InformáticaChurn prediction is a crucial subject in telecom companies. Acquiring a new customer is more expensive than retaining a customer. Identifying such customers requires the address of multiple challenges. The first is caused by telecom datasets. These tend to be high-dimensional and at the same time very sparse, bringing multicollinearity and overfitting issues. Another challenge concerns variable data types. There are static variables and dynamic variables over time. The nature of the use case creates another adversity. The goal is to predict who is leaving the service, but, in any successful company, there are much more clients staying than leaving. This creates what an unbalanced dataset, where the binary target variable has an unbalanced distribution between its’ classes. In this work, a pipeline is proposed targeting the telecom industry. This pipeline aims to address the churn problem, i.e., to identify the clients that have a high propensity to leave the service. The pipeline is designed to deal with the multiple challenges identified and to be adaptable to other telecom datasets. This pipeline is composed of multiple steps, the first step was to restructure data, this was done by realigning all clients by its last month, in an active state, stored in the system. The multiple observations per client were compressed into one using statistics like median and standard deviation, after that feature selection method was applied but multiple options were considered and evaluated at the end of this document. Models were then used to predict variable. These models were adapted to handle unbalance challenge. This work demonstrated the ability to achieve reasonable results using a restructuring proccess and compressing statistics. This work also demonstrated the ability to achieve reasonable good results using a feature selection algorithm.A previsão de churn é crucial nas empresas de telecomunicações. Adquirir um novo cliente é mais dispendioso do que reter um cliente. A identificação de tais clientes requer a abordagem de múltiplos desafios. O primeiro deve-se aos dataset de telecomunicações. Estes tendem a ter uma dimensionalidade elevada sendo, no entanto, esparsos. Isto traz problemas de multicolinearidade e de churn. Outro desafio diz respeito aos tipos de dados. Os dados presentes nestes dataset por norma ou são estáticos ou dinâmicos ao longo do tempo. A natureza do estudo caso em si cria outra adversidade. O objectivo é prever clientes que vão deixar o serviço, mas, em qualquer empresa bem sucedida, existem consideravel- mente mais clientes que ficam do que os que desistem. Isto queria um desequilíbrio na variável objectiva, tendo esta uma distribuição desequilibrada entre classes. Neste trabalho, é proposto uma pipeline à luz da indústria telecom. Esta pipeline visa identificar clientes que com uma elevada propensão para desistir de um determinado serviço. A pipeline foi concebida para lidar com os desafios apresentados e sendo adaptável a outros datasets de telecomincações. O primeiro passo da pipeline foi reestruturar os dados. Realinhou-se todos os dados dos clientes pelo o seu último mês activo existenete no sistema. As múltiplas observações por cliente foram comprimidas numa só usando estatísticas como a mediana e o desvio padrão. Depois, foram aplicados metodos de selecção de variáveis, no entanto foram consideradas e avaliados múltiplos cenários no final deste documento. Por fim a variável objetivo foi modelada usando os múltiplos scenários sendo que modelos usados foram adaptados para lidar com o desequilíbrio da variável objetivo. Este trabalho demonstrou resultados razoáveis ao utilizar um processo de reestruturação e estatísticas de compressão. No mesmo trabalho foram de alcançados bons resultados razoáveis, filtrando algumas variáveis usando um algoritmo de seleção de variáveis.Lopes, MartaLourenço, JoãoRUNCoelho, António Fonseca2023-11-24T18:55:02Z2022-032022-03-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/160464enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T05:43:11Zoai:run.unl.pt:10362/160464Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:58:04.119455Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Telecom Churn Prediction: An approach Towards Big Data
title Telecom Churn Prediction: An approach Towards Big Data
spellingShingle Telecom Churn Prediction: An approach Towards Big Data
Coelho, António Fonseca
Churn
Classification
Modeling
Machine learning
Telecom
Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
title_short Telecom Churn Prediction: An approach Towards Big Data
title_full Telecom Churn Prediction: An approach Towards Big Data
title_fullStr Telecom Churn Prediction: An approach Towards Big Data
title_full_unstemmed Telecom Churn Prediction: An approach Towards Big Data
title_sort Telecom Churn Prediction: An approach Towards Big Data
author Coelho, António Fonseca
author_facet Coelho, António Fonseca
author_role author
dc.contributor.none.fl_str_mv Lopes, Marta
Lourenço, João
RUN
dc.contributor.author.fl_str_mv Coelho, António Fonseca
dc.subject.por.fl_str_mv Churn
Classification
Modeling
Machine learning
Telecom
Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
topic Churn
Classification
Modeling
Machine learning
Telecom
Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
description Churn prediction is a crucial subject in telecom companies. Acquiring a new customer is more expensive than retaining a customer. Identifying such customers requires the address of multiple challenges. The first is caused by telecom datasets. These tend to be high-dimensional and at the same time very sparse, bringing multicollinearity and overfitting issues. Another challenge concerns variable data types. There are static variables and dynamic variables over time. The nature of the use case creates another adversity. The goal is to predict who is leaving the service, but, in any successful company, there are much more clients staying than leaving. This creates what an unbalanced dataset, where the binary target variable has an unbalanced distribution between its’ classes. In this work, a pipeline is proposed targeting the telecom industry. This pipeline aims to address the churn problem, i.e., to identify the clients that have a high propensity to leave the service. The pipeline is designed to deal with the multiple challenges identified and to be adaptable to other telecom datasets. This pipeline is composed of multiple steps, the first step was to restructure data, this was done by realigning all clients by its last month, in an active state, stored in the system. The multiple observations per client were compressed into one using statistics like median and standard deviation, after that feature selection method was applied but multiple options were considered and evaluated at the end of this document. Models were then used to predict variable. These models were adapted to handle unbalance challenge. This work demonstrated the ability to achieve reasonable results using a restructuring proccess and compressing statistics. This work also demonstrated the ability to achieve reasonable good results using a feature selection algorithm.
publishDate 2022
dc.date.none.fl_str_mv 2022-03
2022-03-01T00:00:00Z
2023-11-24T18:55:02Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/160464
url http://hdl.handle.net/10362/160464
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799138162346295296