Type extraction from real estate listings
Autor(a) principal: | |
---|---|
Data de Publicação: | 2020 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10362/109201 |
Resumo: | Project Work presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business Intelligence |
id |
RCAP_c9a61a902d73a454f96edb39350d9d48 |
---|---|
oai_identifier_str |
oai:run.unl.pt:10362/109201 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Type extraction from real estate listingsText ClassificationReal EstateXgboostSupervised LearningProject Work presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceAs the real estate market grows, products that aggregate listings from several websites start to appear. With the number of real estate websites, and consequently the number of property listings that exist, it is not feasible to tag listings by hand . This tagging is fundamental in order to create products from these extracted listings. Products like Automated Valuation Models, Outlier Detection and even search filters depend on a reliable extraction of the property type. This project had the aim of proving that to create such model we don’t need to resort to black box algorithms that give us little interpretability and require a higher level of expertise to debug and maintain. These type of algorithms also tend to require more data to train, which means the data has to be manually labelled which can prove to be a time consuming task. By using a list of keywords to extract from the text and an XGBoost model created a package that extracts the type of listing and gives us some logging information. In this project we managed to get a 95% accuracy across all categories, however the model struggled when encountering listings that can be identified as new development. This approach proved that we don’t always need a state of the art model, which can be complicated to understand, to obtain good results.Gonçalves, Rui Alexandre HenriquesRUNSerras, Alexandra Martins2020-12-23T16:25:02Z2020-10-302020-10-30T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/109201TID:202569837enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T04:53:43Zoai:run.unl.pt:10362/109201Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:41:26.496567Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Type extraction from real estate listings |
title |
Type extraction from real estate listings |
spellingShingle |
Type extraction from real estate listings Serras, Alexandra Martins Text Classification Real Estate Xgboost Supervised Learning |
title_short |
Type extraction from real estate listings |
title_full |
Type extraction from real estate listings |
title_fullStr |
Type extraction from real estate listings |
title_full_unstemmed |
Type extraction from real estate listings |
title_sort |
Type extraction from real estate listings |
author |
Serras, Alexandra Martins |
author_facet |
Serras, Alexandra Martins |
author_role |
author |
dc.contributor.none.fl_str_mv |
Gonçalves, Rui Alexandre Henriques RUN |
dc.contributor.author.fl_str_mv |
Serras, Alexandra Martins |
dc.subject.por.fl_str_mv |
Text Classification Real Estate Xgboost Supervised Learning |
topic |
Text Classification Real Estate Xgboost Supervised Learning |
description |
Project Work presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business Intelligence |
publishDate |
2020 |
dc.date.none.fl_str_mv |
2020-12-23T16:25:02Z 2020-10-30 2020-10-30T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10362/109201 TID:202569837 |
url |
http://hdl.handle.net/10362/109201 |
identifier_str_mv |
TID:202569837 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799138027621056512 |