A sentiment analysis approach to increase authorship identification
Autor(a) principal: | |
---|---|
Data de Publicação: | 2021 |
Outros Autores: | , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/1822/68848 |
Resumo: | Writing style is considered the manner in which an author expresses his thoughts, influenced by language characteristics, period, school, or nation. Often, this writing style can identify the author. One of the most famous examples comes from 1914 in Portuguese literature. With Fernando Pessoa and his heteronyms Alberto Caeiro, alvaro de Campos, and Ricardo Reis, who had completely different writing styles, led people to believe that they were different individuals. Currently, the discussion of authorship identification is more relevant because of the considerable amount of widespread fake news in social media, in which it is hard to identify who authored a text and even a simple quote can impact the public image of an author, especially if these texts or quotes are from politicians. This paper presents a process to analyse the emotion contained in social media messages such as Facebook to identify the author's emotional profile and use it to improve the ability to predict the author of the message. Using preprocessing techniques, lexicon-based approaches, and machine learning, we achieved an authorship identification improvement of approximately 5% in the whole dataset and more than 50% in specific authors when considering the emotional profile on the writing style, thus increasing the ability to identify the author of a text by considering only the author's emotional profile, previously detected from prior texts. |
id |
RCAP_b458e42bdb68fca1d8a3d385d5f4b124 |
---|---|
oai_identifier_str |
oai:repositorium.sdum.uminho.pt:1822/68848 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
A sentiment analysis approach to increase authorship identificationmachine learningnatural language processingsentiment analysisCiências Naturais::Ciências da Computação e da InformaçãoEngenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e InformáticaScience & TechnologyWriting style is considered the manner in which an author expresses his thoughts, influenced by language characteristics, period, school, or nation. Often, this writing style can identify the author. One of the most famous examples comes from 1914 in Portuguese literature. With Fernando Pessoa and his heteronyms Alberto Caeiro, alvaro de Campos, and Ricardo Reis, who had completely different writing styles, led people to believe that they were different individuals. Currently, the discussion of authorship identification is more relevant because of the considerable amount of widespread fake news in social media, in which it is hard to identify who authored a text and even a simple quote can impact the public image of an author, especially if these texts or quotes are from politicians. This paper presents a process to analyse the emotion contained in social media messages such as Facebook to identify the author's emotional profile and use it to improve the ability to predict the author of the message. Using preprocessing techniques, lexicon-based approaches, and machine learning, we achieved an authorship identification improvement of approximately 5% in the whole dataset and more than 50% in specific authors when considering the emotional profile on the writing style, thus increasing the ability to identify the author of a text by considering only the author's emotional profile, previously detected from prior texts.FCT has supported this work – Fundação para a Ciência e Tecnologia within the Project Scope: UID/CEC/00319/2019.WileyUniversidade do MinhoMartins, RicardoAlmeida, J. J.Henriques, Pedro RangelNovais, Paulo20212021-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/1822/68848eng0266-472010.1111/exsy.12469https://onlinelibrary.wiley.com/doi/abs/10.1111/exsy.12469info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-05-11T06:44:00Zoai:repositorium.sdum.uminho.pt:1822/68848Portal AgregadorONGhttps://www.rcaap.pt/oai/openairemluisa.alvim@gmail.comopendoar:71602024-05-11T06:44Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
A sentiment analysis approach to increase authorship identification |
title |
A sentiment analysis approach to increase authorship identification |
spellingShingle |
A sentiment analysis approach to increase authorship identification Martins, Ricardo machine learning natural language processing sentiment analysis Ciências Naturais::Ciências da Computação e da Informação Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática Science & Technology |
title_short |
A sentiment analysis approach to increase authorship identification |
title_full |
A sentiment analysis approach to increase authorship identification |
title_fullStr |
A sentiment analysis approach to increase authorship identification |
title_full_unstemmed |
A sentiment analysis approach to increase authorship identification |
title_sort |
A sentiment analysis approach to increase authorship identification |
author |
Martins, Ricardo |
author_facet |
Martins, Ricardo Almeida, J. J. Henriques, Pedro Rangel Novais, Paulo |
author_role |
author |
author2 |
Almeida, J. J. Henriques, Pedro Rangel Novais, Paulo |
author2_role |
author author author |
dc.contributor.none.fl_str_mv |
Universidade do Minho |
dc.contributor.author.fl_str_mv |
Martins, Ricardo Almeida, J. J. Henriques, Pedro Rangel Novais, Paulo |
dc.subject.por.fl_str_mv |
machine learning natural language processing sentiment analysis Ciências Naturais::Ciências da Computação e da Informação Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática Science & Technology |
topic |
machine learning natural language processing sentiment analysis Ciências Naturais::Ciências da Computação e da Informação Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática Science & Technology |
description |
Writing style is considered the manner in which an author expresses his thoughts, influenced by language characteristics, period, school, or nation. Often, this writing style can identify the author. One of the most famous examples comes from 1914 in Portuguese literature. With Fernando Pessoa and his heteronyms Alberto Caeiro, alvaro de Campos, and Ricardo Reis, who had completely different writing styles, led people to believe that they were different individuals. Currently, the discussion of authorship identification is more relevant because of the considerable amount of widespread fake news in social media, in which it is hard to identify who authored a text and even a simple quote can impact the public image of an author, especially if these texts or quotes are from politicians. This paper presents a process to analyse the emotion contained in social media messages such as Facebook to identify the author's emotional profile and use it to improve the ability to predict the author of the message. Using preprocessing techniques, lexicon-based approaches, and machine learning, we achieved an authorship identification improvement of approximately 5% in the whole dataset and more than 50% in specific authors when considering the emotional profile on the writing style, thus increasing the ability to identify the author of a text by considering only the author's emotional profile, previously detected from prior texts. |
publishDate |
2021 |
dc.date.none.fl_str_mv |
2021 2021-01-01T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/1822/68848 |
url |
http://hdl.handle.net/1822/68848 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
0266-4720 10.1111/exsy.12469 https://onlinelibrary.wiley.com/doi/abs/10.1111/exsy.12469 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Wiley |
publisher.none.fl_str_mv |
Wiley |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
mluisa.alvim@gmail.com |
_version_ |
1817545075035471872 |