Social media cross-source and cross-domain sentiment classification

Detalhes bibliográficos
Autor(a) principal: Zola, Paola
Data de Publicação: 2019
Outros Autores: Cortez, Paulo, Ragno, Costantino, Brentari, Eugenio
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/1822/62770
Resumo: Due to the expansion of Internet and Web 2.0 phenomenon, there is a growing interest in the sentiment analysis of freely opinionated text. In this paper, we propose a novel cross-source cross-domain sentiment classification, in which cross-domain labeled Web sources (Amazon and Tripadvisor) are used to train supervised learning models (including two deep learning algorithms) that are tested on typically non labeled social media reviews (Facebook and Twitter). We explored a three step methodology, in which dis- tinct balanced training, text preprocessing and machine learning methods were tested, using two languages: English and Italian. The best results were achieved when using undersampling training and a Convolutional Neural Network. Interesting cross-source classification performances were achieved, in particular when using Amazon and Tripadvisor reviews to train a model that is tested on Facebook data for both English and Italian.
id RCAP_908250926c5482ade5bb56c8e917df72
oai_identifier_str oai:repositorium.sdum.uminho.pt:1822/62770
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Social media cross-source and cross-domain sentiment classificationConvolutional neural networkcross-domain datasentiment analysissocial mediaFacebookTwitterEngenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e InformáticaScience & TechnologyDue to the expansion of Internet and Web 2.0 phenomenon, there is a growing interest in the sentiment analysis of freely opinionated text. In this paper, we propose a novel cross-source cross-domain sentiment classification, in which cross-domain labeled Web sources (Amazon and Tripadvisor) are used to train supervised learning models (including two deep learning algorithms) that are tested on typically non labeled social media reviews (Facebook and Twitter). We explored a three step methodology, in which dis- tinct balanced training, text preprocessing and machine learning methods were tested, using two languages: English and Italian. The best results were achieved when using undersampling training and a Convolutional Neural Network. Interesting cross-source classification performances were achieved, in particular when using Amazon and Tripadvisor reviews to train a model that is tested on Facebook data for both English and Italian.Research carried out with the support of resources of Big&Open Data Innovation Laboratory (BODaI-Lab), the University of Brescia, granted by Fondazione Cariplo and Regione Lombardia. The work of P. Cortez was supported by FCT - Fundacao para a Ciencia e Tecnologia within the Project Scope UID/CEC/00319/2019. We would also like to thank the three anonymous reviewers for their helpful suggestions.World ScientificUniversidade do MinhoZola, PaolaCortez, PauloRagno, CostantinoBrentari, Eugenio20192019-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/1822/62770engWorld Scientific, 18(5): 1469-1499, September, 2019, ISSN 0219-6220.0219-622010.1142/S0219622019500305https://doi.org/10.1142/S0219622019500305info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-07-21T12:45:06Zoai:repositorium.sdum.uminho.pt:1822/62770Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T19:42:53.868539Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Social media cross-source and cross-domain sentiment classification
title Social media cross-source and cross-domain sentiment classification
spellingShingle Social media cross-source and cross-domain sentiment classification
Zola, Paola
Convolutional neural network
cross-domain data
sentiment analysis
social media
Facebook
Twitter
Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
Science & Technology
title_short Social media cross-source and cross-domain sentiment classification
title_full Social media cross-source and cross-domain sentiment classification
title_fullStr Social media cross-source and cross-domain sentiment classification
title_full_unstemmed Social media cross-source and cross-domain sentiment classification
title_sort Social media cross-source and cross-domain sentiment classification
author Zola, Paola
author_facet Zola, Paola
Cortez, Paulo
Ragno, Costantino
Brentari, Eugenio
author_role author
author2 Cortez, Paulo
Ragno, Costantino
Brentari, Eugenio
author2_role author
author
author
dc.contributor.none.fl_str_mv Universidade do Minho
dc.contributor.author.fl_str_mv Zola, Paola
Cortez, Paulo
Ragno, Costantino
Brentari, Eugenio
dc.subject.por.fl_str_mv Convolutional neural network
cross-domain data
sentiment analysis
social media
Facebook
Twitter
Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
Science & Technology
topic Convolutional neural network
cross-domain data
sentiment analysis
social media
Facebook
Twitter
Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
Science & Technology
description Due to the expansion of Internet and Web 2.0 phenomenon, there is a growing interest in the sentiment analysis of freely opinionated text. In this paper, we propose a novel cross-source cross-domain sentiment classification, in which cross-domain labeled Web sources (Amazon and Tripadvisor) are used to train supervised learning models (including two deep learning algorithms) that are tested on typically non labeled social media reviews (Facebook and Twitter). We explored a three step methodology, in which dis- tinct balanced training, text preprocessing and machine learning methods were tested, using two languages: English and Italian. The best results were achieved when using undersampling training and a Convolutional Neural Network. Interesting cross-source classification performances were achieved, in particular when using Amazon and Tripadvisor reviews to train a model that is tested on Facebook data for both English and Italian.
publishDate 2019
dc.date.none.fl_str_mv 2019
2019-01-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/1822/62770
url http://hdl.handle.net/1822/62770
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv World Scientific, 18(5): 1469-1499, September, 2019, ISSN 0219-6220.
0219-6220
10.1142/S0219622019500305
https://doi.org/10.1142/S0219622019500305
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv World Scientific
publisher.none.fl_str_mv World Scientific
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799132984142462976