Cross-company customer churn prediction in telecommunication: a comparison of data transformation methods

Detalhes bibliográficos
Autor(a) principal: Amin, Adnan
Data de Publicação: 2019
Outros Autores: Shah, Babar, Khattak, Asad Masood, Moreira, Fernando, Ali, Gohar, Rocha, Álvaro, Anwar, Sajid
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/11328/2679
Resumo: Cross-Company Churn Prediction (CCCP) is a domain of research where one company (target) is lacking enough data and can use data from another company (source) to predict customer churn successfully. To support CCCP, the cross-company data is usually transformed to a set of similar normal distribution of target company data prior to building a CCCP model. However, it is still unclear which data transformation method is most effective in CCCP. Also, the impact of data transformation methods on CCCP model performance using different classifiers have not been comprehensively explored in the telecommunication sector. In this study, we devised a model for CCCP using data transformation methods (i.e., log, z-score, rank and box-cox) and presented not only an extensive comparison to validate the impact of these transformation methods in CCCP, but also evaluated the performance of underlying baseline classifiers (i.e., Naive Bayes (NB), K-Nearest Neighbour (KNN), Gradient Boosted Tree (GBT), Single Rule Induction (SRI) and Deep learner Neural net (DP)) for customer churn prediction in telecommunication sector using the above mentioned data transformation methods. We performed experiments on publicly available datasets related to the telecommunication sector. The results demonstrated that most of the data transformation methods (e.g., log, rank, and box-cox) improve the performance of CCCP significantly. However, the Z-Score data transformation method could not achieve better results as compared to the rest of the data transformation methods in this study. Moreover, it is also investigated that the CCCP model based on NB outperform on transformed data and DP, KNN and GBT performed on the average, while SRI classifier did not show significant results in term of the commonly used evaluation measures (i.e., probability of detection, probability of false alarm, area under the curve and g-mean).
id RCAP_dc7c83d3cbdb8aa93c1e7290b3ddd2b6
oai_identifier_str oai:repositorio.uportu.pt:11328/2679
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str
spelling Cross-company customer churn prediction in telecommunication: a comparison of data transformation methodsChurn predictionCross-companyData transformationBox-coxRankLogZ-ScoreCross-Company Churn Prediction (CCCP) is a domain of research where one company (target) is lacking enough data and can use data from another company (source) to predict customer churn successfully. To support CCCP, the cross-company data is usually transformed to a set of similar normal distribution of target company data prior to building a CCCP model. However, it is still unclear which data transformation method is most effective in CCCP. Also, the impact of data transformation methods on CCCP model performance using different classifiers have not been comprehensively explored in the telecommunication sector. In this study, we devised a model for CCCP using data transformation methods (i.e., log, z-score, rank and box-cox) and presented not only an extensive comparison to validate the impact of these transformation methods in CCCP, but also evaluated the performance of underlying baseline classifiers (i.e., Naive Bayes (NB), K-Nearest Neighbour (KNN), Gradient Boosted Tree (GBT), Single Rule Induction (SRI) and Deep learner Neural net (DP)) for customer churn prediction in telecommunication sector using the above mentioned data transformation methods. We performed experiments on publicly available datasets related to the telecommunication sector. The results demonstrated that most of the data transformation methods (e.g., log, rank, and box-cox) improve the performance of CCCP significantly. However, the Z-Score data transformation method could not achieve better results as compared to the rest of the data transformation methods in this study. Moreover, it is also investigated that the CCCP model based on NB outperform on transformed data and DP, KNN and GBT performed on the average, while SRI classifier did not show significant results in term of the commonly used evaluation measures (i.e., probability of detection, probability of false alarm, area under the curve and g-mean).Elsevier2019-05-10T10:19:37Z2020-06-30T00:00:00Z2019-06-01T00:00:00Z2019-06info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/11328/2679engAmin, AdnanShah, BabarKhattak, Asad MasoodMoreira, FernandoAli, GoharRocha, ÁlvaroAnwar, Sajidinfo:eu-repo/semantics/embargoedAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-06-15T02:11:04ZPortal AgregadorONG
dc.title.none.fl_str_mv Cross-company customer churn prediction in telecommunication: a comparison of data transformation methods
title Cross-company customer churn prediction in telecommunication: a comparison of data transformation methods
spellingShingle Cross-company customer churn prediction in telecommunication: a comparison of data transformation methods
Amin, Adnan
Churn prediction
Cross-company
Data transformation
Box-cox
Rank
Log
Z-Score
title_short Cross-company customer churn prediction in telecommunication: a comparison of data transformation methods
title_full Cross-company customer churn prediction in telecommunication: a comparison of data transformation methods
title_fullStr Cross-company customer churn prediction in telecommunication: a comparison of data transformation methods
title_full_unstemmed Cross-company customer churn prediction in telecommunication: a comparison of data transformation methods
title_sort Cross-company customer churn prediction in telecommunication: a comparison of data transformation methods
author Amin, Adnan
author_facet Amin, Adnan
Shah, Babar
Khattak, Asad Masood
Moreira, Fernando
Ali, Gohar
Rocha, Álvaro
Anwar, Sajid
author_role author
author2 Shah, Babar
Khattak, Asad Masood
Moreira, Fernando
Ali, Gohar
Rocha, Álvaro
Anwar, Sajid
author2_role author
author
author
author
author
author
dc.contributor.author.fl_str_mv Amin, Adnan
Shah, Babar
Khattak, Asad Masood
Moreira, Fernando
Ali, Gohar
Rocha, Álvaro
Anwar, Sajid
dc.subject.por.fl_str_mv Churn prediction
Cross-company
Data transformation
Box-cox
Rank
Log
Z-Score
topic Churn prediction
Cross-company
Data transformation
Box-cox
Rank
Log
Z-Score
description Cross-Company Churn Prediction (CCCP) is a domain of research where one company (target) is lacking enough data and can use data from another company (source) to predict customer churn successfully. To support CCCP, the cross-company data is usually transformed to a set of similar normal distribution of target company data prior to building a CCCP model. However, it is still unclear which data transformation method is most effective in CCCP. Also, the impact of data transformation methods on CCCP model performance using different classifiers have not been comprehensively explored in the telecommunication sector. In this study, we devised a model for CCCP using data transformation methods (i.e., log, z-score, rank and box-cox) and presented not only an extensive comparison to validate the impact of these transformation methods in CCCP, but also evaluated the performance of underlying baseline classifiers (i.e., Naive Bayes (NB), K-Nearest Neighbour (KNN), Gradient Boosted Tree (GBT), Single Rule Induction (SRI) and Deep learner Neural net (DP)) for customer churn prediction in telecommunication sector using the above mentioned data transformation methods. We performed experiments on publicly available datasets related to the telecommunication sector. The results demonstrated that most of the data transformation methods (e.g., log, rank, and box-cox) improve the performance of CCCP significantly. However, the Z-Score data transformation method could not achieve better results as compared to the rest of the data transformation methods in this study. Moreover, it is also investigated that the CCCP model based on NB outperform on transformed data and DP, KNN and GBT performed on the average, while SRI classifier did not show significant results in term of the commonly used evaluation measures (i.e., probability of detection, probability of false alarm, area under the curve and g-mean).
publishDate 2019
dc.date.none.fl_str_mv 2019-05-10T10:19:37Z
2019-06-01T00:00:00Z
2019-06
2020-06-30T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/11328/2679
url http://hdl.handle.net/11328/2679
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/embargoedAccess
eu_rights_str_mv embargoedAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Elsevier
publisher.none.fl_str_mv Elsevier
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv
repository.mail.fl_str_mv
_version_ 1777302553623527424