Demographics imputation in marketing sector by means of machine learning

Detalhes bibliográficos
Autor(a) principal: Venediktova, Margarita Aleksandrovna
Data de Publicação: 2023
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10362/150590
Resumo: Internship Report presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data Science
id RCAP_578f99693d999e521a13669f115364d5
oai_identifier_str oai:run.unl.pt:10362/150590
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Demographics imputation in marketing sector by means of machine learningMachine LearningMissing Values ImputationInternship Report presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceThe goal of this project is to develop a predictive model in order to impute missing values in data collected through surveys (demographics data) and evaluate its performance. Currently there are two existing issues: demographics data for each user is either incomplete or missing entirely. Current POC is an attempt to exploit the capabilities of machine learning in order to impute missing demographics data. Data cleaning, normalization, feature selection was performed prior to applying sampling techniques and training several machine learning models. The following machine learning models were trained and tested: Random Forest and Gradient Boosting. After, the metrics appropriate for the current business purposes were selected and models’ performance was evaluated. The results for the targets ‘Ethnicity’, ‘Hispanic’ and ‘Household income’ are not within the acceptable range and therefore could not be used in production at the moment. The metrics obtained with the default hyperparameters indicate that both models demonstrate similar results for ‘Hispanic’ and ‘Ethnicity’ response variables. ‘Household income’ variable seems to have the poorest results, not allowing to predict the variable with adequate accuracy. Current POC suggests that the accurate prediction of demographic variable is complex task and is accompanied by certain challenges: weak relationship between demographic variables and purchase behavior, purchase location and neighborhood and its demographic characteristics, unreliable data, sparse feature set. Further investigations on feature selection and incorporation of other data sources for the training data should be considered.Pinheiro, Flávio Luís PortasRUNVenediktova, Margarita Aleksandrovna2023-03-15T17:17:38Z2023-01-272023-01-27T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/150590TID:203247264enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T05:32:39Zoai:run.unl.pt:10362/150590Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:54:12.542875Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Demographics imputation in marketing sector by means of machine learning
title Demographics imputation in marketing sector by means of machine learning
spellingShingle Demographics imputation in marketing sector by means of machine learning
Venediktova, Margarita Aleksandrovna
Machine Learning
Missing Values Imputation
title_short Demographics imputation in marketing sector by means of machine learning
title_full Demographics imputation in marketing sector by means of machine learning
title_fullStr Demographics imputation in marketing sector by means of machine learning
title_full_unstemmed Demographics imputation in marketing sector by means of machine learning
title_sort Demographics imputation in marketing sector by means of machine learning
author Venediktova, Margarita Aleksandrovna
author_facet Venediktova, Margarita Aleksandrovna
author_role author
dc.contributor.none.fl_str_mv Pinheiro, Flávio Luís Portas
RUN
dc.contributor.author.fl_str_mv Venediktova, Margarita Aleksandrovna
dc.subject.por.fl_str_mv Machine Learning
Missing Values Imputation
topic Machine Learning
Missing Values Imputation
description Internship Report presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data Science
publishDate 2023
dc.date.none.fl_str_mv 2023-03-15T17:17:38Z
2023-01-27
2023-01-27T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/150590
TID:203247264
url http://hdl.handle.net/10362/150590
identifier_str_mv TID:203247264
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799138131300057088