A data-driven approach to predict the value and key features of collectible cars

Detalhes bibliográficos
Autor(a) principal: Pires, Pedro Miguel Geraldes
Data de Publicação: 2020
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10071/21876
Resumo: In our study, a Data Mining approach is proposed to predict the selling price of collectible cars at auction and determine which vehicle's characteristics influence this value. RM Sotheby's, a prestigious auction house, allowed us over 30,000 vehicles in order to build our data set. The use of different models allowed us to analyze a large data set with 18 features. In order to determine which model is most suitable for our study, 11 Data Mining models were compared using 4 metrics (MAE, NMAE, RAE and RMSPE), the 11 models were tested on our data set using a rolling window scheme. "Xgboost", a decision tree model, presented the best results (RMSPE = 12.69%). In addition, a method of extracting knowledge from sensitivity analysis was applied, which allowed us to determine the key features in the sales price (Brand.continent, Car.age, km.categorized and Model.identifier). In addition, a comparison with traditional investments was made and it was used to analyze the impact of coronavirus. Our results demonstrated a superior performance when compared to iron, FTSE MIB and FTSE 100. In the case of coronavirus, there was a significant drop in volume and sales since the beginning of coronavirus, although the average price per car has increased, which does not allow us to guarantee that there was an impact in this market. A more detailed analysis to 25 different models was conducted. From these, 9 saw an appreciation in price while the remaining devalued. In total, those 25 models had an overall devaluation of 5%.
id RCAP_ec0ce97106ed1a28db8590798d28292c
oai_identifier_str oai:repositorio.iscte-iul.pt:10071/21876
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling A data-driven approach to predict the value and key features of collectible carsData miningCollectible cars marketAlternative investmentsEmotional assetsCoronavirusMineração de dadosMercado de carros colecionáveisInvestimentos alternativosAtivos emocionaisIn our study, a Data Mining approach is proposed to predict the selling price of collectible cars at auction and determine which vehicle's characteristics influence this value. RM Sotheby's, a prestigious auction house, allowed us over 30,000 vehicles in order to build our data set. The use of different models allowed us to analyze a large data set with 18 features. In order to determine which model is most suitable for our study, 11 Data Mining models were compared using 4 metrics (MAE, NMAE, RAE and RMSPE), the 11 models were tested on our data set using a rolling window scheme. "Xgboost", a decision tree model, presented the best results (RMSPE = 12.69%). In addition, a method of extracting knowledge from sensitivity analysis was applied, which allowed us to determine the key features in the sales price (Brand.continent, Car.age, km.categorized and Model.identifier). In addition, a comparison with traditional investments was made and it was used to analyze the impact of coronavirus. Our results demonstrated a superior performance when compared to iron, FTSE MIB and FTSE 100. In the case of coronavirus, there was a significant drop in volume and sales since the beginning of coronavirus, although the average price per car has increased, which does not allow us to guarantee that there was an impact in this market. A more detailed analysis to 25 different models was conducted. From these, 9 saw an appreciation in price while the remaining devalued. In total, those 25 models had an overall devaluation of 5%.No nosso estudo é proposto uma abordagem de Data Mining para prever o preço de venda de carros colecionáveis em leilão e determinar quais as características do veículo que influenciam esse valor. RM Sotheby's, uma prestigiada casa de leilões, permitiu-nos recolher mais de 30.000 veículos para construir o nosso conjunto de dados. O uso de diferentes modelos permitiu que analisássemos um grande conjunto de dados com 19 variáveis. De forma a determinar qual o modelo mais adequado ao nosso estudo, 11 modelos de Data Mining foram comparados através de 4 métricas (MAE, NMAE, RAE e RMSPE), os modelos foram testados através de um esquema de janela rolante. "Xgboost", um modelo de árvore de decisão, apresentou os melhores resultados (RMSPE= 12,69%). Além disso, foi aplicado um método de extração de conhecimento de análise de sensibilidade que nos permitiu determinar os principais influenciadores no preço de venda (Brand.continent, Car.age, km.categorized e Model.identifier). Para alem disso uma comparação com investimentos mais tradicionais foi realizada assim como uma análise do impacto do coronavírus. Os nossos resultados demonstraram um desempenho superior em comparaçao ao ferro, FTSE MIB e FTSE 100. No caso do coronavírus verificou-se uma queda significativa no volume e nas vendas, no período correspondente ao coronavírus, embora o preço médio por carro tenha aumentado. Uma análise mais detalhada foi também realizada em 25 modelos distintos, destes, 9 viram uma valorização no preço enquanto os restantes sofreram uma desvalorização. No total, os 25 modelos tiveram uma desvalorização de aproximadamente 5%.2021-02-04T09:49:01Z2020-11-25T00:00:00Z2020-11-252020-10info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10071/21876TID:202555933engPires, Pedro Miguel Geraldesinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-09T17:57:46Zoai:repositorio.iscte-iul.pt:10071/21876Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T22:29:53.871915Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv A data-driven approach to predict the value and key features of collectible cars
title A data-driven approach to predict the value and key features of collectible cars
spellingShingle A data-driven approach to predict the value and key features of collectible cars
Pires, Pedro Miguel Geraldes
Data mining
Collectible cars market
Alternative investments
Emotional assets
Coronavirus
Mineração de dados
Mercado de carros colecionáveis
Investimentos alternativos
Ativos emocionais
title_short A data-driven approach to predict the value and key features of collectible cars
title_full A data-driven approach to predict the value and key features of collectible cars
title_fullStr A data-driven approach to predict the value and key features of collectible cars
title_full_unstemmed A data-driven approach to predict the value and key features of collectible cars
title_sort A data-driven approach to predict the value and key features of collectible cars
author Pires, Pedro Miguel Geraldes
author_facet Pires, Pedro Miguel Geraldes
author_role author
dc.contributor.author.fl_str_mv Pires, Pedro Miguel Geraldes
dc.subject.por.fl_str_mv Data mining
Collectible cars market
Alternative investments
Emotional assets
Coronavirus
Mineração de dados
Mercado de carros colecionáveis
Investimentos alternativos
Ativos emocionais
topic Data mining
Collectible cars market
Alternative investments
Emotional assets
Coronavirus
Mineração de dados
Mercado de carros colecionáveis
Investimentos alternativos
Ativos emocionais
description In our study, a Data Mining approach is proposed to predict the selling price of collectible cars at auction and determine which vehicle's characteristics influence this value. RM Sotheby's, a prestigious auction house, allowed us over 30,000 vehicles in order to build our data set. The use of different models allowed us to analyze a large data set with 18 features. In order to determine which model is most suitable for our study, 11 Data Mining models were compared using 4 metrics (MAE, NMAE, RAE and RMSPE), the 11 models were tested on our data set using a rolling window scheme. "Xgboost", a decision tree model, presented the best results (RMSPE = 12.69%). In addition, a method of extracting knowledge from sensitivity analysis was applied, which allowed us to determine the key features in the sales price (Brand.continent, Car.age, km.categorized and Model.identifier). In addition, a comparison with traditional investments was made and it was used to analyze the impact of coronavirus. Our results demonstrated a superior performance when compared to iron, FTSE MIB and FTSE 100. In the case of coronavirus, there was a significant drop in volume and sales since the beginning of coronavirus, although the average price per car has increased, which does not allow us to guarantee that there was an impact in this market. A more detailed analysis to 25 different models was conducted. From these, 9 saw an appreciation in price while the remaining devalued. In total, those 25 models had an overall devaluation of 5%.
publishDate 2020
dc.date.none.fl_str_mv 2020-11-25T00:00:00Z
2020-11-25
2020-10
2021-02-04T09:49:01Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10071/21876
TID:202555933
url http://hdl.handle.net/10071/21876
identifier_str_mv TID:202555933
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799134861144883200