Anomaly detection on natural language processing to improve predictions on tourist preferences
Autor(a) principal: | |
---|---|
Data de Publicação: | 2022 |
Outros Autores: | , , , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | https://hdl.handle.net/1822/79084 |
Resumo: | Argumentation-based dialogue models have shown to be appropriate for decision contexts in which it is intended to overcome the lack of interaction between decision-makers, either because they are dispersed, they are too many, or they are simply not even known. However, to support decision processes with argumentation-based dialogue models, it is necessary to have knowledge of certain aspects that are specific to each decision-maker, such as preferences, interests, and limitations, among others. Failure to obtain this knowledge could ruin the model’s success. In this work, we sought to facilitate the information acquisition process by studying strategies to automatically predict the tourists’ preferences (ratings) in relation to points of interest based on their reviews. We explored different Machine Learning methods to predict users’ ratings. We used Natural Language Processing strategies to predict whether a review is positive or negative and the rating assigned by users on a scale of 1 to 5. We then applied supervised methods such as Logistic Regression, Random Forest, Decision Trees, K-Nearest Neighbors, and Recurrent Neural Networks to determine whether a tourist likes/dislikes a given point of interest. We also used a distinctive approach in this field through unsupervised techniques for anomaly detection problems. The goal was to improve the supervised model in identifying only those tourists who truly like or dislike a particular point of interest, in which the main objective is not to identify everyone, but fundamentally not to fail those who are identified in those conditions. The experiments carried out showed that the developed models could predict with high accuracy whether a review is positive or negative but have some difficulty in accurately predicting the rating assigned by users. Unsupervised method Local Outlier Factor improved the results, reducing Logistic Regression false positives with an associated cost of increasing false negatives. |
id |
RCAP_d0ff2fb392f7920999dbc8e017aff55c |
---|---|
oai_identifier_str |
oai:repositorium.sdum.uminho.pt:1822/79084 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Anomaly detection on natural language processing to improve predictions on tourist preferencesMachine LearningNatural Language ProcessingSentiment analysisArgumentation-based dialoguesTourismTripAdvisorScience & TechnologyArgumentation-based dialogue models have shown to be appropriate for decision contexts in which it is intended to overcome the lack of interaction between decision-makers, either because they are dispersed, they are too many, or they are simply not even known. However, to support decision processes with argumentation-based dialogue models, it is necessary to have knowledge of certain aspects that are specific to each decision-maker, such as preferences, interests, and limitations, among others. Failure to obtain this knowledge could ruin the model’s success. In this work, we sought to facilitate the information acquisition process by studying strategies to automatically predict the tourists’ preferences (ratings) in relation to points of interest based on their reviews. We explored different Machine Learning methods to predict users’ ratings. We used Natural Language Processing strategies to predict whether a review is positive or negative and the rating assigned by users on a scale of 1 to 5. We then applied supervised methods such as Logistic Regression, Random Forest, Decision Trees, K-Nearest Neighbors, and Recurrent Neural Networks to determine whether a tourist likes/dislikes a given point of interest. We also used a distinctive approach in this field through unsupervised techniques for anomaly detection problems. The goal was to improve the supervised model in identifying only those tourists who truly like or dislike a particular point of interest, in which the main objective is not to identify everyone, but fundamentally not to fail those who are identified in those conditions. The experiments carried out showed that the developed models could predict with high accuracy whether a review is positive or negative but have some difficulty in accurately predicting the rating assigned by users. Unsupervised method Local Outlier Factor improved the results, reducing Logistic Regression false positives with an associated cost of increasing false negatives.This work was supported by the GrouPlanner Project under the European Regional Development Fund POCI-01-0145-FEDER-29178 and by National Funds through the FCT—Fundação para a Ciência e a Tecnologia (Portuguese Foundation for Science and Technology) within the Projects UIDB/00319/2020 and UIDP/00760/2020.Multidisciplinary Digital Publishing Institute (MDPI)Universidade do MinhoMeira, JorgeCarneiro, JoãoBolón-Canedo, VerónicaAlonso-Betanzos, AmparoNovais, PauloMarreiros, Goreti2022-03-032022-03-03T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://hdl.handle.net/1822/79084engMeira, J.; Carneiro, J.; Bolón-Canedo, V.; Alonso-Betanzos, A.; Novais, P.; Marreiros, G. Anomaly Detection on Natural Language Processing to Improve Predictions on Tourist Preferences. Electronics 2022, 11, 779. https://doi.org/10.3390/electronics110507792079-929210.3390/electronics11050779779https://www.mdpi.com/2079-9292/11/5/779info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-07-21T12:29:58Zoai:repositorium.sdum.uminho.pt:1822/79084Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T19:25:04.052620Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Anomaly detection on natural language processing to improve predictions on tourist preferences |
title |
Anomaly detection on natural language processing to improve predictions on tourist preferences |
spellingShingle |
Anomaly detection on natural language processing to improve predictions on tourist preferences Meira, Jorge Machine Learning Natural Language Processing Sentiment analysis Argumentation-based dialogues Tourism TripAdvisor Science & Technology |
title_short |
Anomaly detection on natural language processing to improve predictions on tourist preferences |
title_full |
Anomaly detection on natural language processing to improve predictions on tourist preferences |
title_fullStr |
Anomaly detection on natural language processing to improve predictions on tourist preferences |
title_full_unstemmed |
Anomaly detection on natural language processing to improve predictions on tourist preferences |
title_sort |
Anomaly detection on natural language processing to improve predictions on tourist preferences |
author |
Meira, Jorge |
author_facet |
Meira, Jorge Carneiro, João Bolón-Canedo, Verónica Alonso-Betanzos, Amparo Novais, Paulo Marreiros, Goreti |
author_role |
author |
author2 |
Carneiro, João Bolón-Canedo, Verónica Alonso-Betanzos, Amparo Novais, Paulo Marreiros, Goreti |
author2_role |
author author author author author |
dc.contributor.none.fl_str_mv |
Universidade do Minho |
dc.contributor.author.fl_str_mv |
Meira, Jorge Carneiro, João Bolón-Canedo, Verónica Alonso-Betanzos, Amparo Novais, Paulo Marreiros, Goreti |
dc.subject.por.fl_str_mv |
Machine Learning Natural Language Processing Sentiment analysis Argumentation-based dialogues Tourism TripAdvisor Science & Technology |
topic |
Machine Learning Natural Language Processing Sentiment analysis Argumentation-based dialogues Tourism TripAdvisor Science & Technology |
description |
Argumentation-based dialogue models have shown to be appropriate for decision contexts in which it is intended to overcome the lack of interaction between decision-makers, either because they are dispersed, they are too many, or they are simply not even known. However, to support decision processes with argumentation-based dialogue models, it is necessary to have knowledge of certain aspects that are specific to each decision-maker, such as preferences, interests, and limitations, among others. Failure to obtain this knowledge could ruin the model’s success. In this work, we sought to facilitate the information acquisition process by studying strategies to automatically predict the tourists’ preferences (ratings) in relation to points of interest based on their reviews. We explored different Machine Learning methods to predict users’ ratings. We used Natural Language Processing strategies to predict whether a review is positive or negative and the rating assigned by users on a scale of 1 to 5. We then applied supervised methods such as Logistic Regression, Random Forest, Decision Trees, K-Nearest Neighbors, and Recurrent Neural Networks to determine whether a tourist likes/dislikes a given point of interest. We also used a distinctive approach in this field through unsupervised techniques for anomaly detection problems. The goal was to improve the supervised model in identifying only those tourists who truly like or dislike a particular point of interest, in which the main objective is not to identify everyone, but fundamentally not to fail those who are identified in those conditions. The experiments carried out showed that the developed models could predict with high accuracy whether a review is positive or negative but have some difficulty in accurately predicting the rating assigned by users. Unsupervised method Local Outlier Factor improved the results, reducing Logistic Regression false positives with an associated cost of increasing false negatives. |
publishDate |
2022 |
dc.date.none.fl_str_mv |
2022-03-03 2022-03-03T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://hdl.handle.net/1822/79084 |
url |
https://hdl.handle.net/1822/79084 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
Meira, J.; Carneiro, J.; Bolón-Canedo, V.; Alonso-Betanzos, A.; Novais, P.; Marreiros, G. Anomaly Detection on Natural Language Processing to Improve Predictions on Tourist Preferences. Electronics 2022, 11, 779. https://doi.org/10.3390/electronics11050779 2079-9292 10.3390/electronics11050779 779 https://www.mdpi.com/2079-9292/11/5/779 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Multidisciplinary Digital Publishing Institute (MDPI) |
publisher.none.fl_str_mv |
Multidisciplinary Digital Publishing Institute (MDPI) |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799132732560769024 |