Type extraction from real estate listings

Detalhes bibliográficos
Autor(a) principal: Serras, Alexandra Martins
Data de Publicação: 2020
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10362/109201
Resumo: Project Work presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business Intelligence
id RCAP_c9a61a902d73a454f96edb39350d9d48
oai_identifier_str oai:run.unl.pt:10362/109201
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Type extraction from real estate listingsText ClassificationReal EstateXgboostSupervised LearningProject Work presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceAs the real estate market grows, products that aggregate listings from several websites start to appear. With the number of real estate websites, and consequently the number of property listings that exist, it is not feasible to tag listings by hand . This tagging is fundamental in order to create products from these extracted listings. Products like Automated Valuation Models, Outlier Detection and even search filters depend on a reliable extraction of the property type. This project had the aim of proving that to create such model we don’t need to resort to black box algorithms that give us little interpretability and require a higher level of expertise to debug and maintain. These type of algorithms also tend to require more data to train, which means the data has to be manually labelled which can prove to be a time consuming task. By using a list of keywords to extract from the text and an XGBoost model created a package that extracts the type of listing and gives us some logging information. In this project we managed to get a 95% accuracy across all categories, however the model struggled when encountering listings that can be identified as new development. This approach proved that we don’t always need a state of the art model, which can be complicated to understand, to obtain good results.Gonçalves, Rui Alexandre HenriquesRUNSerras, Alexandra Martins2020-12-23T16:25:02Z2020-10-302020-10-30T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/109201TID:202569837enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T04:53:43Zoai:run.unl.pt:10362/109201Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:41:26.496567Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Type extraction from real estate listings
title Type extraction from real estate listings
spellingShingle Type extraction from real estate listings
Serras, Alexandra Martins
Text Classification
Real Estate
Xgboost
Supervised Learning
title_short Type extraction from real estate listings
title_full Type extraction from real estate listings
title_fullStr Type extraction from real estate listings
title_full_unstemmed Type extraction from real estate listings
title_sort Type extraction from real estate listings
author Serras, Alexandra Martins
author_facet Serras, Alexandra Martins
author_role author
dc.contributor.none.fl_str_mv Gonçalves, Rui Alexandre Henriques
RUN
dc.contributor.author.fl_str_mv Serras, Alexandra Martins
dc.subject.por.fl_str_mv Text Classification
Real Estate
Xgboost
Supervised Learning
topic Text Classification
Real Estate
Xgboost
Supervised Learning
description Project Work presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business Intelligence
publishDate 2020
dc.date.none.fl_str_mv 2020-12-23T16:25:02Z
2020-10-30
2020-10-30T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/109201
TID:202569837
url http://hdl.handle.net/10362/109201
identifier_str_mv TID:202569837
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799138027621056512