Machine and Deep Learning models for house price prediction in United States of America and Portugal

Detalhes bibliográficos
Autor(a) principal: Sanchez de La Fuente, Catarina de Freitas
Data de Publicação: 2022
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10071/27005
Resumo: The present study describes the development process of a system to predict the houses’ prices in Portugal. Two main phases of this process were the data extraction and the comparison among several algorithms. Data Extraction was made through Web Scraping techniques applied the Mais Consultores site [1]. This study used Text Mining methods - Rule-based Matching and Similarity – in order to structure and obtain meaning from the information extracted. Afterwards, this thesis made a comparison among the application of Machine Learning and Deep Learning algorithms: Support Vector Machines (SVM), Decision Tree Regressor (DTR), Random Forest, K-Nearest Neighbour (KNN), Artificial Neural Networks ANN, Convolutional Neural Networks CNN, Recurrent Neural Networks RNN, Multi-layer Perceptron MLP and Long Short-term Memory LSTM Network. Finding this solution was the prime motivation of the present thesis. The results obtained by the used algorithms, both Machine Learning and Deep Learning, demonstrated that the algorithms needed more data for the training set. Additionally, the algorithms with the best results, i.e., with the lesser value of Mean Absolute Error (MAE), Mean Square Error (MSE) and Root Mean Square Error (RSME) and the better score were the Deep Learning algorithms.
id RCAP_cd911a53070202864b0188103b9e5b79
oai_identifier_str oai:repositorio.iscte-iul.pt:10071/27005
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Machine and Deep Learning models for house price prediction in United States of America and PortugalHouse price predictionMachine learningDeep learningText miningPreços das casasThe present study describes the development process of a system to predict the houses’ prices in Portugal. Two main phases of this process were the data extraction and the comparison among several algorithms. Data Extraction was made through Web Scraping techniques applied the Mais Consultores site [1]. This study used Text Mining methods - Rule-based Matching and Similarity – in order to structure and obtain meaning from the information extracted. Afterwards, this thesis made a comparison among the application of Machine Learning and Deep Learning algorithms: Support Vector Machines (SVM), Decision Tree Regressor (DTR), Random Forest, K-Nearest Neighbour (KNN), Artificial Neural Networks ANN, Convolutional Neural Networks CNN, Recurrent Neural Networks RNN, Multi-layer Perceptron MLP and Long Short-term Memory LSTM Network. Finding this solution was the prime motivation of the present thesis. The results obtained by the used algorithms, both Machine Learning and Deep Learning, demonstrated that the algorithms needed more data for the training set. Additionally, the algorithms with the best results, i.e., with the lesser value of Mean Absolute Error (MAE), Mean Square Error (MSE) and Root Mean Square Error (RSME) and the better score were the Deep Learning algorithms.A presente estudo utilizou a medotologia CRISP-DM para a caraterização e descrição do processo de desenvolvimento de um sistema para estimação dos preço das casas em Portugal. Duas fases importantes no processo foram: a extração de dados e a comparação entre vários algoritmos. A extração de dados foi realizada através de técnicas de Web Scraping a partir do site Mais Consultores [1]. Utilizaram-se métodos de Text Mining - Rule-based Matching e Similarity – para estruturar e retirar significado da informação que se extraiu do site. De seguida, realizámos a comparação entre a aplicação de algoritmos de Machine Learning e Deep Learning: Support Vector Machines (SVM), Decision Tree Regressor (DTR), Random Forest, K-Nearest Neighbour (KNN), Artificial Neural Networks ANN, Convolutional Neural Networks CNN, Recurrent Neural Networks RNN, Multi-layer Perceptron MLP and Long Short-term Memory LSTM Network. Encontrar esta solução constituiu a principal motivação da presente tese. Os resultados obtidos pelos algoritmos utilizados, tanto os de Machine Learning como os de Deep Learning, demonstram que os algoritmos precisavam de mais dados para treino. Adicionalmente, os algoritmos com melhores resultados, i.e., com menor Mean Absolute Error (MAE), Mean Square Error (MSE) e Root Mean Square Error (RSME) e maior score foram os algorimos de Deep Learning.2023-12-16T00:00:00Z2022-12-16T00:00:00Z2022-12-162022-11info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10071/27005TID:203134885engSanchez de La Fuente, Catarina de Freitasinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-12-24T01:19:43Zoai:repositorio.iscte-iul.pt:10071/27005Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T22:30:13.807315Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Machine and Deep Learning models for house price prediction in United States of America and Portugal
title Machine and Deep Learning models for house price prediction in United States of America and Portugal
spellingShingle Machine and Deep Learning models for house price prediction in United States of America and Portugal
Sanchez de La Fuente, Catarina de Freitas
House price prediction
Machine learning
Deep learning
Text mining
Preços das casas
title_short Machine and Deep Learning models for house price prediction in United States of America and Portugal
title_full Machine and Deep Learning models for house price prediction in United States of America and Portugal
title_fullStr Machine and Deep Learning models for house price prediction in United States of America and Portugal
title_full_unstemmed Machine and Deep Learning models for house price prediction in United States of America and Portugal
title_sort Machine and Deep Learning models for house price prediction in United States of America and Portugal
author Sanchez de La Fuente, Catarina de Freitas
author_facet Sanchez de La Fuente, Catarina de Freitas
author_role author
dc.contributor.author.fl_str_mv Sanchez de La Fuente, Catarina de Freitas
dc.subject.por.fl_str_mv House price prediction
Machine learning
Deep learning
Text mining
Preços das casas
topic House price prediction
Machine learning
Deep learning
Text mining
Preços das casas
description The present study describes the development process of a system to predict the houses’ prices in Portugal. Two main phases of this process were the data extraction and the comparison among several algorithms. Data Extraction was made through Web Scraping techniques applied the Mais Consultores site [1]. This study used Text Mining methods - Rule-based Matching and Similarity – in order to structure and obtain meaning from the information extracted. Afterwards, this thesis made a comparison among the application of Machine Learning and Deep Learning algorithms: Support Vector Machines (SVM), Decision Tree Regressor (DTR), Random Forest, K-Nearest Neighbour (KNN), Artificial Neural Networks ANN, Convolutional Neural Networks CNN, Recurrent Neural Networks RNN, Multi-layer Perceptron MLP and Long Short-term Memory LSTM Network. Finding this solution was the prime motivation of the present thesis. The results obtained by the used algorithms, both Machine Learning and Deep Learning, demonstrated that the algorithms needed more data for the training set. Additionally, the algorithms with the best results, i.e., with the lesser value of Mean Absolute Error (MAE), Mean Square Error (MSE) and Root Mean Square Error (RSME) and the better score were the Deep Learning algorithms.
publishDate 2022
dc.date.none.fl_str_mv 2022-12-16T00:00:00Z
2022-12-16
2022-11
2023-12-16T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10071/27005
TID:203134885
url http://hdl.handle.net/10071/27005
identifier_str_mv TID:203134885
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799134864444751872