Demographics imputation in marketing sector by means of machine learning
Autor(a) principal: | |
---|---|
Data de Publicação: | 2023 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10362/150590 |
Resumo: | Internship Report presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data Science |
id |
RCAP_578f99693d999e521a13669f115364d5 |
---|---|
oai_identifier_str |
oai:run.unl.pt:10362/150590 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Demographics imputation in marketing sector by means of machine learningMachine LearningMissing Values ImputationInternship Report presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceThe goal of this project is to develop a predictive model in order to impute missing values in data collected through surveys (demographics data) and evaluate its performance. Currently there are two existing issues: demographics data for each user is either incomplete or missing entirely. Current POC is an attempt to exploit the capabilities of machine learning in order to impute missing demographics data. Data cleaning, normalization, feature selection was performed prior to applying sampling techniques and training several machine learning models. The following machine learning models were trained and tested: Random Forest and Gradient Boosting. After, the metrics appropriate for the current business purposes were selected and models’ performance was evaluated. The results for the targets ‘Ethnicity’, ‘Hispanic’ and ‘Household income’ are not within the acceptable range and therefore could not be used in production at the moment. The metrics obtained with the default hyperparameters indicate that both models demonstrate similar results for ‘Hispanic’ and ‘Ethnicity’ response variables. ‘Household income’ variable seems to have the poorest results, not allowing to predict the variable with adequate accuracy. Current POC suggests that the accurate prediction of demographic variable is complex task and is accompanied by certain challenges: weak relationship between demographic variables and purchase behavior, purchase location and neighborhood and its demographic characteristics, unreliable data, sparse feature set. Further investigations on feature selection and incorporation of other data sources for the training data should be considered.Pinheiro, Flávio Luís PortasRUNVenediktova, Margarita Aleksandrovna2023-03-15T17:17:38Z2023-01-272023-01-27T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/150590TID:203247264enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T05:32:39Zoai:run.unl.pt:10362/150590Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:54:12.542875Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Demographics imputation in marketing sector by means of machine learning |
title |
Demographics imputation in marketing sector by means of machine learning |
spellingShingle |
Demographics imputation in marketing sector by means of machine learning Venediktova, Margarita Aleksandrovna Machine Learning Missing Values Imputation |
title_short |
Demographics imputation in marketing sector by means of machine learning |
title_full |
Demographics imputation in marketing sector by means of machine learning |
title_fullStr |
Demographics imputation in marketing sector by means of machine learning |
title_full_unstemmed |
Demographics imputation in marketing sector by means of machine learning |
title_sort |
Demographics imputation in marketing sector by means of machine learning |
author |
Venediktova, Margarita Aleksandrovna |
author_facet |
Venediktova, Margarita Aleksandrovna |
author_role |
author |
dc.contributor.none.fl_str_mv |
Pinheiro, Flávio Luís Portas RUN |
dc.contributor.author.fl_str_mv |
Venediktova, Margarita Aleksandrovna |
dc.subject.por.fl_str_mv |
Machine Learning Missing Values Imputation |
topic |
Machine Learning Missing Values Imputation |
description |
Internship Report presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data Science |
publishDate |
2023 |
dc.date.none.fl_str_mv |
2023-03-15T17:17:38Z 2023-01-27 2023-01-27T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10362/150590 TID:203247264 |
url |
http://hdl.handle.net/10362/150590 |
identifier_str_mv |
TID:203247264 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799138131300057088 |