Social media cross-source and cross-domain sentiment classification
Autor(a) principal: | |
---|---|
Data de Publicação: | 2019 |
Outros Autores: | , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/1822/62770 |
Resumo: | Due to the expansion of Internet and Web 2.0 phenomenon, there is a growing interest in the sentiment analysis of freely opinionated text. In this paper, we propose a novel cross-source cross-domain sentiment classification, in which cross-domain labeled Web sources (Amazon and Tripadvisor) are used to train supervised learning models (including two deep learning algorithms) that are tested on typically non labeled social media reviews (Facebook and Twitter). We explored a three step methodology, in which dis- tinct balanced training, text preprocessing and machine learning methods were tested, using two languages: English and Italian. The best results were achieved when using undersampling training and a Convolutional Neural Network. Interesting cross-source classification performances were achieved, in particular when using Amazon and Tripadvisor reviews to train a model that is tested on Facebook data for both English and Italian. |
id |
RCAP_908250926c5482ade5bb56c8e917df72 |
---|---|
oai_identifier_str |
oai:repositorium.sdum.uminho.pt:1822/62770 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Social media cross-source and cross-domain sentiment classificationConvolutional neural networkcross-domain datasentiment analysissocial mediaFacebookTwitterEngenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e InformáticaScience & TechnologyDue to the expansion of Internet and Web 2.0 phenomenon, there is a growing interest in the sentiment analysis of freely opinionated text. In this paper, we propose a novel cross-source cross-domain sentiment classification, in which cross-domain labeled Web sources (Amazon and Tripadvisor) are used to train supervised learning models (including two deep learning algorithms) that are tested on typically non labeled social media reviews (Facebook and Twitter). We explored a three step methodology, in which dis- tinct balanced training, text preprocessing and machine learning methods were tested, using two languages: English and Italian. The best results were achieved when using undersampling training and a Convolutional Neural Network. Interesting cross-source classification performances were achieved, in particular when using Amazon and Tripadvisor reviews to train a model that is tested on Facebook data for both English and Italian.Research carried out with the support of resources of Big&Open Data Innovation Laboratory (BODaI-Lab), the University of Brescia, granted by Fondazione Cariplo and Regione Lombardia. The work of P. Cortez was supported by FCT - Fundacao para a Ciencia e Tecnologia within the Project Scope UID/CEC/00319/2019. We would also like to thank the three anonymous reviewers for their helpful suggestions.World ScientificUniversidade do MinhoZola, PaolaCortez, PauloRagno, CostantinoBrentari, Eugenio20192019-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/1822/62770engWorld Scientific, 18(5): 1469-1499, September, 2019, ISSN 0219-6220.0219-622010.1142/S0219622019500305https://doi.org/10.1142/S0219622019500305info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-07-21T12:45:06Zoai:repositorium.sdum.uminho.pt:1822/62770Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T19:42:53.868539Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Social media cross-source and cross-domain sentiment classification |
title |
Social media cross-source and cross-domain sentiment classification |
spellingShingle |
Social media cross-source and cross-domain sentiment classification Zola, Paola Convolutional neural network cross-domain data sentiment analysis social media Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática Science & Technology |
title_short |
Social media cross-source and cross-domain sentiment classification |
title_full |
Social media cross-source and cross-domain sentiment classification |
title_fullStr |
Social media cross-source and cross-domain sentiment classification |
title_full_unstemmed |
Social media cross-source and cross-domain sentiment classification |
title_sort |
Social media cross-source and cross-domain sentiment classification |
author |
Zola, Paola |
author_facet |
Zola, Paola Cortez, Paulo Ragno, Costantino Brentari, Eugenio |
author_role |
author |
author2 |
Cortez, Paulo Ragno, Costantino Brentari, Eugenio |
author2_role |
author author author |
dc.contributor.none.fl_str_mv |
Universidade do Minho |
dc.contributor.author.fl_str_mv |
Zola, Paola Cortez, Paulo Ragno, Costantino Brentari, Eugenio |
dc.subject.por.fl_str_mv |
Convolutional neural network cross-domain data sentiment analysis social media Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática Science & Technology |
topic |
Convolutional neural network cross-domain data sentiment analysis social media Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática Science & Technology |
description |
Due to the expansion of Internet and Web 2.0 phenomenon, there is a growing interest in the sentiment analysis of freely opinionated text. In this paper, we propose a novel cross-source cross-domain sentiment classification, in which cross-domain labeled Web sources (Amazon and Tripadvisor) are used to train supervised learning models (including two deep learning algorithms) that are tested on typically non labeled social media reviews (Facebook and Twitter). We explored a three step methodology, in which dis- tinct balanced training, text preprocessing and machine learning methods were tested, using two languages: English and Italian. The best results were achieved when using undersampling training and a Convolutional Neural Network. Interesting cross-source classification performances were achieved, in particular when using Amazon and Tripadvisor reviews to train a model that is tested on Facebook data for both English and Italian. |
publishDate |
2019 |
dc.date.none.fl_str_mv |
2019 2019-01-01T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/1822/62770 |
url |
http://hdl.handle.net/1822/62770 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
World Scientific, 18(5): 1469-1499, September, 2019, ISSN 0219-6220. 0219-6220 10.1142/S0219622019500305 https://doi.org/10.1142/S0219622019500305 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
World Scientific |
publisher.none.fl_str_mv |
World Scientific |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799132984142462976 |