Anomaly Detection on Natural Language Processing to Improve Predictions on Tourist Preferences

Meira, Jorge; Carneiro, João; Bolón-Canedo, Verónica; Alonso-Betanzos, Amparo; Novais, Paulo; Marreiros, Goreti

Anomaly Detection on Natural Language Processing to Improve Predictions on Tourist Preferences

Detalhes bibliográficos
Autor(a) principal:	Meira, Jorge
Data de Publicação:	2022
Outros Autores:	Carneiro, João, Bolón-Canedo, Verónica, Alonso-Betanzos, Amparo, Novais, Paulo, Marreiros, Goreti
Tipo de documento:	Artigo
Idioma:	eng
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	http://hdl.handle.net/10400.22/22044
Resumo:	Argumentation-based dialogue models have shown to be appropriate for decision contexts in which it is intended to overcome the lack of interaction between decision-makers, either because they are dispersed, they are too many, or they are simply not even known. However, to support decision processes with argumentation-based dialogue models, it is necessary to have knowledge of certain aspects that are specific to each decision-maker, such as preferences, interests, and limitations, among others. Failure to obtain this knowledge could ruin the model’s success. In this work, we sought to facilitate the information acquisition process by studying strategies to automatically predict the tourists’ preferences (ratings) in relation to points of interest based on their reviews. We explored different Machine Learning methods to predict users’ ratings. We used Natural Language Processing strategies to predict whether a review is positive or negative and the rating assigned by users on a scale of 1 to 5. We then applied supervised methods such as Logistic Regression, Random Forest, Decision Trees, K-Nearest Neighbors, and Recurrent Neural Networks to determine whether a tourist likes/dislikes a given point of interest. We also used a distinctive approach in this field through unsupervised techniques for anomaly detection problems. The goal was to improve the supervised model in identifying only those tourists who truly like or dislike a particular point of interest, in which the main objective is not to identify everyone, but fundamentally not to fail those who are identified in those conditions. The experiments carried out showed that the developed models could predict with high accuracy whether a review is positive or negative but have some difficulty in accurately predicting the rating assigned by users. Unsupervised method Local Outlier Factor improved the results, reducing Logistic Regression false positives with an associated cost of increasing false negatives.

Metadados do item

id	RCAP_0e105cb0609827e952120107e871684e
oai_identifier_str	oai:recipp.ipp.pt:10400.22/22044
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	Anomaly Detection on Natural Language Processing to Improve Predictions on Tourist PreferencesMachine LearningNatural Language ProcessingSentiment analysisArgumentation-based dialoguesTourismTripAdvisorArgumentation-based dialogue models have shown to be appropriate for decision contexts in which it is intended to overcome the lack of interaction between decision-makers, either because they are dispersed, they are too many, or they are simply not even known. However, to support decision processes with argumentation-based dialogue models, it is necessary to have knowledge of certain aspects that are specific to each decision-maker, such as preferences, interests, and limitations, among others. Failure to obtain this knowledge could ruin the model’s success. In this work, we sought to facilitate the information acquisition process by studying strategies to automatically predict the tourists’ preferences (ratings) in relation to points of interest based on their reviews. We explored different Machine Learning methods to predict users’ ratings. We used Natural Language Processing strategies to predict whether a review is positive or negative and the rating assigned by users on a scale of 1 to 5. We then applied supervised methods such as Logistic Regression, Random Forest, Decision Trees, K-Nearest Neighbors, and Recurrent Neural Networks to determine whether a tourist likes/dislikes a given point of interest. We also used a distinctive approach in this field through unsupervised techniques for anomaly detection problems. The goal was to improve the supervised model in identifying only those tourists who truly like or dislike a particular point of interest, in which the main objective is not to identify everyone, but fundamentally not to fail those who are identified in those conditions. The experiments carried out showed that the developed models could predict with high accuracy whether a review is positive or negative but have some difficulty in accurately predicting the rating assigned by users. Unsupervised method Local Outlier Factor improved the results, reducing Logistic Regression false positives with an associated cost of increasing false negatives.This work was supported by the GrouPlanner Project under the European Regional Development Fund POCI-01-0145-FEDER-29178 and by National Funds through the FCT—Fundação para a Ciência e a Tecnologia (Portuguese Foundation for Science and Technology) within the Projects UIDB/00319/2020 and UIDP/00760/2020.MDPIRepositório Científico do Instituto Politécnico do PortoMeira, JorgeCarneiro, JoãoBolón-Canedo, VerónicaAlonso-Betanzos, AmparoNovais, PauloMarreiros, Goreti2023-02-01T09:50:40Z20222022-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10400.22/22044eng10.3390/electronics11050779info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-03-13T13:18:24Zoai:recipp.ipp.pt:10400.22/22044Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T17:42:07.451624Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	Anomaly Detection on Natural Language Processing to Improve Predictions on Tourist Preferences
title	Anomaly Detection on Natural Language Processing to Improve Predictions on Tourist Preferences
spellingShingle	Anomaly Detection on Natural Language Processing to Improve Predictions on Tourist Preferences Meira, Jorge Machine Learning Natural Language Processing Sentiment analysis Argumentation-based dialogues Tourism TripAdvisor
title_short	Anomaly Detection on Natural Language Processing to Improve Predictions on Tourist Preferences
title_full	Anomaly Detection on Natural Language Processing to Improve Predictions on Tourist Preferences
title_fullStr	Anomaly Detection on Natural Language Processing to Improve Predictions on Tourist Preferences
title_full_unstemmed	Anomaly Detection on Natural Language Processing to Improve Predictions on Tourist Preferences
title_sort	Anomaly Detection on Natural Language Processing to Improve Predictions on Tourist Preferences
author	Meira, Jorge
author_facet	Meira, Jorge Carneiro, João Bolón-Canedo, Verónica Alonso-Betanzos, Amparo Novais, Paulo Marreiros, Goreti
author_role	author
author2	Carneiro, João Bolón-Canedo, Verónica Alonso-Betanzos, Amparo Novais, Paulo Marreiros, Goreti
author2_role	author author author author author
dc.contributor.none.fl_str_mv	Repositório Científico do Instituto Politécnico do Porto
dc.contributor.author.fl_str_mv	Meira, Jorge Carneiro, João Bolón-Canedo, Verónica Alonso-Betanzos, Amparo Novais, Paulo Marreiros, Goreti
dc.subject.por.fl_str_mv	Machine Learning Natural Language Processing Sentiment analysis Argumentation-based dialogues Tourism TripAdvisor
topic	Machine Learning Natural Language Processing Sentiment analysis Argumentation-based dialogues Tourism TripAdvisor
description	Argumentation-based dialogue models have shown to be appropriate for decision contexts in which it is intended to overcome the lack of interaction between decision-makers, either because they are dispersed, they are too many, or they are simply not even known. However, to support decision processes with argumentation-based dialogue models, it is necessary to have knowledge of certain aspects that are specific to each decision-maker, such as preferences, interests, and limitations, among others. Failure to obtain this knowledge could ruin the model’s success. In this work, we sought to facilitate the information acquisition process by studying strategies to automatically predict the tourists’ preferences (ratings) in relation to points of interest based on their reviews. We explored different Machine Learning methods to predict users’ ratings. We used Natural Language Processing strategies to predict whether a review is positive or negative and the rating assigned by users on a scale of 1 to 5. We then applied supervised methods such as Logistic Regression, Random Forest, Decision Trees, K-Nearest Neighbors, and Recurrent Neural Networks to determine whether a tourist likes/dislikes a given point of interest. We also used a distinctive approach in this field through unsupervised techniques for anomaly detection problems. The goal was to improve the supervised model in identifying only those tourists who truly like or dislike a particular point of interest, in which the main objective is not to identify everyone, but fundamentally not to fail those who are identified in those conditions. The experiments carried out showed that the developed models could predict with high accuracy whether a review is positive or negative but have some difficulty in accurately predicting the rating assigned by users. Unsupervised method Local Outlier Factor improved the results, reducing Logistic Regression false positives with an associated cost of increasing false negatives.
publishDate	2022
dc.date.none.fl_str_mv	2022 2022-01-01T00:00:00Z 2023-02-01T09:50:40Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10400.22/22044
url	http://hdl.handle.net/10400.22/22044
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	10.3390/electronics11050779
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	MDPI
publisher.none.fl_str_mv	MDPI
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799131507915227136

Anomaly Detection on Natural Language Processing to Improve Predictions on Tourist Preferences

Registros relacionados