Text Mining Techniques for Car Price Prediction

Detalhes bibliográficos
Autor(a) principal: Gonçalves, Ricardo Miguel Galvão
Data de Publicação: 2022
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10362/135551
Resumo: Project Work presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data Science
id RCAP_897930f8a73a178a83a41e6cab8edf06
oai_identifier_str oai:run.unl.pt:10362/135551
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Text Mining Techniques for Car Price PredictionText MiningRegression AnalysisCar Price PredictionProject Work presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceModern data sources routinely contain information both in unstructured and structured forms, combining text with the usual numerical and categorical data. For instance, in websites dedicated for selling and buying cars the listings typically include a textual description of the car. Others also include a detailed list of numerical or categorical attributes, such as the total number of kilometers the car has, or it´s model. In this work project we apply text mining techniques to create predictors for car price regression from unstructured data, the textual description in car listings. Two different types of predictors were studied, the tf-idf features obtained from the n-gram count matrix, or the singular vectors derived from the decomposition of the tf-idf matrix. In this work we also examine the performance of reducing the vocabulary dimension by applying stemming, lemmatization or not applying either of those. We also compare the effects of creating the initial n-gram count matrix with only unigrams, unigrams and bigrams or only bigrams. Our regression experiment shows that Support Vector Regression performs best at car price prediction using text data as predictors with R2 = 0.77, MSE = 0.19 and MAE = 0.32. These results can be seen as respectable given the complex nature of the task.Henriques, Roberto André PereiraRUNGonçalves, Ricardo Miguel Galvão2022-03-30T16:58:48Z2022-03-022022-03-02T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/135551TID:202979733enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T05:13:54Zoai:run.unl.pt:10362/135551Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:48:26.983209Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Text Mining Techniques for Car Price Prediction
title Text Mining Techniques for Car Price Prediction
spellingShingle Text Mining Techniques for Car Price Prediction
Gonçalves, Ricardo Miguel Galvão
Text Mining
Regression Analysis
Car Price Prediction
title_short Text Mining Techniques for Car Price Prediction
title_full Text Mining Techniques for Car Price Prediction
title_fullStr Text Mining Techniques for Car Price Prediction
title_full_unstemmed Text Mining Techniques for Car Price Prediction
title_sort Text Mining Techniques for Car Price Prediction
author Gonçalves, Ricardo Miguel Galvão
author_facet Gonçalves, Ricardo Miguel Galvão
author_role author
dc.contributor.none.fl_str_mv Henriques, Roberto André Pereira
RUN
dc.contributor.author.fl_str_mv Gonçalves, Ricardo Miguel Galvão
dc.subject.por.fl_str_mv Text Mining
Regression Analysis
Car Price Prediction
topic Text Mining
Regression Analysis
Car Price Prediction
description Project Work presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data Science
publishDate 2022
dc.date.none.fl_str_mv 2022-03-30T16:58:48Z
2022-03-02
2022-03-02T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/135551
TID:202979733
url http://hdl.handle.net/10362/135551
identifier_str_mv TID:202979733
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799138085547540480