Text Mining Techniques for Car Price Prediction
Autor(a) principal: | |
---|---|
Data de Publicação: | 2022 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10362/135551 |
Resumo: | Project Work presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data Science |
id |
RCAP_897930f8a73a178a83a41e6cab8edf06 |
---|---|
oai_identifier_str |
oai:run.unl.pt:10362/135551 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Text Mining Techniques for Car Price PredictionText MiningRegression AnalysisCar Price PredictionProject Work presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceModern data sources routinely contain information both in unstructured and structured forms, combining text with the usual numerical and categorical data. For instance, in websites dedicated for selling and buying cars the listings typically include a textual description of the car. Others also include a detailed list of numerical or categorical attributes, such as the total number of kilometers the car has, or it´s model. In this work project we apply text mining techniques to create predictors for car price regression from unstructured data, the textual description in car listings. Two different types of predictors were studied, the tf-idf features obtained from the n-gram count matrix, or the singular vectors derived from the decomposition of the tf-idf matrix. In this work we also examine the performance of reducing the vocabulary dimension by applying stemming, lemmatization or not applying either of those. We also compare the effects of creating the initial n-gram count matrix with only unigrams, unigrams and bigrams or only bigrams. Our regression experiment shows that Support Vector Regression performs best at car price prediction using text data as predictors with R2 = 0.77, MSE = 0.19 and MAE = 0.32. These results can be seen as respectable given the complex nature of the task.Henriques, Roberto André PereiraRUNGonçalves, Ricardo Miguel Galvão2022-03-30T16:58:48Z2022-03-022022-03-02T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/135551TID:202979733enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T05:13:54Zoai:run.unl.pt:10362/135551Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:48:26.983209Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Text Mining Techniques for Car Price Prediction |
title |
Text Mining Techniques for Car Price Prediction |
spellingShingle |
Text Mining Techniques for Car Price Prediction Gonçalves, Ricardo Miguel Galvão Text Mining Regression Analysis Car Price Prediction |
title_short |
Text Mining Techniques for Car Price Prediction |
title_full |
Text Mining Techniques for Car Price Prediction |
title_fullStr |
Text Mining Techniques for Car Price Prediction |
title_full_unstemmed |
Text Mining Techniques for Car Price Prediction |
title_sort |
Text Mining Techniques for Car Price Prediction |
author |
Gonçalves, Ricardo Miguel Galvão |
author_facet |
Gonçalves, Ricardo Miguel Galvão |
author_role |
author |
dc.contributor.none.fl_str_mv |
Henriques, Roberto André Pereira RUN |
dc.contributor.author.fl_str_mv |
Gonçalves, Ricardo Miguel Galvão |
dc.subject.por.fl_str_mv |
Text Mining Regression Analysis Car Price Prediction |
topic |
Text Mining Regression Analysis Car Price Prediction |
description |
Project Work presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data Science |
publishDate |
2022 |
dc.date.none.fl_str_mv |
2022-03-30T16:58:48Z 2022-03-02 2022-03-02T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10362/135551 TID:202979733 |
url |
http://hdl.handle.net/10362/135551 |
identifier_str_mv |
TID:202979733 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799138085547540480 |