A sentiment analysis approach to increase authorship identification

Detalhes bibliográficos
Autor(a) principal: Martins, Ricardo
Data de Publicação: 2021
Outros Autores: Almeida, J. J., Henriques, Pedro Rangel, Novais, Paulo
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/1822/68848
Resumo: Writing style is considered the manner in which an author expresses his thoughts, influenced by language characteristics, period, school, or nation. Often, this writing style can identify the author. One of the most famous examples comes from 1914 in Portuguese literature. With Fernando Pessoa and his heteronyms Alberto Caeiro, alvaro de Campos, and Ricardo Reis, who had completely different writing styles, led people to believe that they were different individuals. Currently, the discussion of authorship identification is more relevant because of the considerable amount of widespread fake news in social media, in which it is hard to identify who authored a text and even a simple quote can impact the public image of an author, especially if these texts or quotes are from politicians. This paper presents a process to analyse the emotion contained in social media messages such as Facebook to identify the author's emotional profile and use it to improve the ability to predict the author of the message. Using preprocessing techniques, lexicon-based approaches, and machine learning, we achieved an authorship identification improvement of approximately 5% in the whole dataset and more than 50% in specific authors when considering the emotional profile on the writing style, thus increasing the ability to identify the author of a text by considering only the author's emotional profile, previously detected from prior texts.
id RCAP_b458e42bdb68fca1d8a3d385d5f4b124
oai_identifier_str oai:repositorium.sdum.uminho.pt:1822/68848
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling A sentiment analysis approach to increase authorship identificationmachine learningnatural language processingsentiment analysisCiências Naturais::Ciências da Computação e da InformaçãoEngenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e InformáticaScience & TechnologyWriting style is considered the manner in which an author expresses his thoughts, influenced by language characteristics, period, school, or nation. Often, this writing style can identify the author. One of the most famous examples comes from 1914 in Portuguese literature. With Fernando Pessoa and his heteronyms Alberto Caeiro, alvaro de Campos, and Ricardo Reis, who had completely different writing styles, led people to believe that they were different individuals. Currently, the discussion of authorship identification is more relevant because of the considerable amount of widespread fake news in social media, in which it is hard to identify who authored a text and even a simple quote can impact the public image of an author, especially if these texts or quotes are from politicians. This paper presents a process to analyse the emotion contained in social media messages such as Facebook to identify the author's emotional profile and use it to improve the ability to predict the author of the message. Using preprocessing techniques, lexicon-based approaches, and machine learning, we achieved an authorship identification improvement of approximately 5% in the whole dataset and more than 50% in specific authors when considering the emotional profile on the writing style, thus increasing the ability to identify the author of a text by considering only the author's emotional profile, previously detected from prior texts.FCT has supported this work – Fundação para a Ciência e Tecnologia within the Project Scope: UID/CEC/00319/2019.WileyUniversidade do MinhoMartins, RicardoAlmeida, J. J.Henriques, Pedro RangelNovais, Paulo20212021-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/1822/68848eng0266-472010.1111/exsy.12469https://onlinelibrary.wiley.com/doi/abs/10.1111/exsy.12469info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-07-21T12:38:43Zoai:repositorium.sdum.uminho.pt:1822/68848Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T19:35:14.493669Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv A sentiment analysis approach to increase authorship identification
title A sentiment analysis approach to increase authorship identification
spellingShingle A sentiment analysis approach to increase authorship identification
Martins, Ricardo
machine learning
natural language processing
sentiment analysis
Ciências Naturais::Ciências da Computação e da Informação
Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
Science & Technology
title_short A sentiment analysis approach to increase authorship identification
title_full A sentiment analysis approach to increase authorship identification
title_fullStr A sentiment analysis approach to increase authorship identification
title_full_unstemmed A sentiment analysis approach to increase authorship identification
title_sort A sentiment analysis approach to increase authorship identification
author Martins, Ricardo
author_facet Martins, Ricardo
Almeida, J. J.
Henriques, Pedro Rangel
Novais, Paulo
author_role author
author2 Almeida, J. J.
Henriques, Pedro Rangel
Novais, Paulo
author2_role author
author
author
dc.contributor.none.fl_str_mv Universidade do Minho
dc.contributor.author.fl_str_mv Martins, Ricardo
Almeida, J. J.
Henriques, Pedro Rangel
Novais, Paulo
dc.subject.por.fl_str_mv machine learning
natural language processing
sentiment analysis
Ciências Naturais::Ciências da Computação e da Informação
Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
Science & Technology
topic machine learning
natural language processing
sentiment analysis
Ciências Naturais::Ciências da Computação e da Informação
Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
Science & Technology
description Writing style is considered the manner in which an author expresses his thoughts, influenced by language characteristics, period, school, or nation. Often, this writing style can identify the author. One of the most famous examples comes from 1914 in Portuguese literature. With Fernando Pessoa and his heteronyms Alberto Caeiro, alvaro de Campos, and Ricardo Reis, who had completely different writing styles, led people to believe that they were different individuals. Currently, the discussion of authorship identification is more relevant because of the considerable amount of widespread fake news in social media, in which it is hard to identify who authored a text and even a simple quote can impact the public image of an author, especially if these texts or quotes are from politicians. This paper presents a process to analyse the emotion contained in social media messages such as Facebook to identify the author's emotional profile and use it to improve the ability to predict the author of the message. Using preprocessing techniques, lexicon-based approaches, and machine learning, we achieved an authorship identification improvement of approximately 5% in the whole dataset and more than 50% in specific authors when considering the emotional profile on the writing style, thus increasing the ability to identify the author of a text by considering only the author's emotional profile, previously detected from prior texts.
publishDate 2021
dc.date.none.fl_str_mv 2021
2021-01-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/1822/68848
url http://hdl.handle.net/1822/68848
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 0266-4720
10.1111/exsy.12469
https://onlinelibrary.wiley.com/doi/abs/10.1111/exsy.12469
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Wiley
publisher.none.fl_str_mv Wiley
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799132877073416192