Individuation of Authorship and Style Identification: analysis of Literary works carried with R

Detalhes bibliográficos
Autor(a) principal: Lima e Silva, Luis Filipe
Data de Publicação: 2022
Outros Autores: Santos Ciríaco, Larissa
Tipo de documento: Artigo
Idioma: por
Título da fonte: Fórum Linguístico
Texto Completo: https://periodicos.ufsc.br/index.php/forum/article/view/79086
Resumo: This paper adds to the works available on Natural Language Processing by providing a demonstration of how programming languages ​​such as R (R CORE TEAM, 2020) can be useful in detecting authorship and identifying the style of the author in literary works. Two authors and two works each were selected, namely: The Adventures of Tom Sawyer (1876) and Adventures of Huckleberry Finn (1884) by author Mark Twain (1835-1910), and Typee: A Peep at Polynesian Life (1846) and Omoo: A Narrative of Adventures in the South Seas (1847) by author Herman Melville (1819-1891). Subsequently, the data were analyzed following the same methodology as Eder et al. (2016), in order to test the effectiveness of the stylo package and apply the Principal Component Analysis, Cluster Analysis and Consensus Tree methods. The results showed that each of the tested methods was able to distinguish the works of the authors, thus evidencing the effectiveness of the package used. In addition, a stylometric analysis is performed based on Craig's Zeta and Rolling Delta methods. For the latter, works by two German-speaking authors, Frank Kafka and Heinrich von Kleist, were used. The results pointed to a stylistic similarity of von Kleist, especially in Kafka’s first work. Additionally, Rolling Delta was used to explore an analysis carried by Juola (2013a, 2013b) regarding a work by J. K. Rowling written under the pseudonym of Robert Galbraith.
id UFSC-24_47c6170f0df8f07a8c80796753d6a8d2
oai_identifier_str oai:periodicos.ufsc.br:article/79086
network_acronym_str UFSC-24
network_name_str Fórum Linguístico
repository_id_str
spelling Individuation of Authorship and Style Identification: analysis of Literary works carried with RIndividuación de autoría e identificación de estilo: análisis de obras literárias com RIndividuação de autoria e identificação de estilo: análise de dados linguísticos com auxílio do RThis paper adds to the works available on Natural Language Processing by providing a demonstration of how programming languages ​​such as R (R CORE TEAM, 2020) can be useful in detecting authorship and identifying the style of the author in literary works. Two authors and two works each were selected, namely: The Adventures of Tom Sawyer (1876) and Adventures of Huckleberry Finn (1884) by author Mark Twain (1835-1910), and Typee: A Peep at Polynesian Life (1846) and Omoo: A Narrative of Adventures in the South Seas (1847) by author Herman Melville (1819-1891). Subsequently, the data were analyzed following the same methodology as Eder et al. (2016), in order to test the effectiveness of the stylo package and apply the Principal Component Analysis, Cluster Analysis and Consensus Tree methods. The results showed that each of the tested methods was able to distinguish the works of the authors, thus evidencing the effectiveness of the package used. In addition, a stylometric analysis is performed based on Craig's Zeta and Rolling Delta methods. For the latter, works by two German-speaking authors, Frank Kafka and Heinrich von Kleist, were used. The results pointed to a stylistic similarity of von Kleist, especially in Kafka’s first work. Additionally, Rolling Delta was used to explore an analysis carried by Juola (2013a, 2013b) regarding a work by J. K. Rowling written under the pseudonym of Robert Galbraith.Este artículo se suma a los trabajos disponibles sobre procesamiento del lenguaje natural al proporcionar una demostración de cómo los lenguajes de programación como R (R CORE TEAM, 2020) pueden ser útiles para detectar la autoría e identificar el estilo del autor en obras literarias. Se seleccionaron dos autores y dos obras de cada uno, a saber: The Adventures of Tom Sawyer (1876) y Adventures of Huckleberry Finn (1884) del autor Mark Twain (1835-1910), y Typee: A Peep at Polynesian Life (1846) y Omoo: A Narrative of Adventures in the South Seas (1847) del autor Herman Melville (1819-1891). Posteriormente, los datos se analizaron utilizando la misma metodología que Eder et al. (2016), con el fin de probar la efectividad del paquete stylo y aplicar los métodos de Análisis de Componentes Principales, Análisis de Cluster y Árbol de Consenso. Los resultados mostraron que cada uno de los métodos probados fue capaz de distinguir los trabajos de los autores, evidenciando así la efectividad del paquete utilizado. Además, se realiza un análisis estilométrico basado en los métodos de Zeta de Craig y Rolling Delta. Para esto último, se utilizaron obras de dos autores de habla alemana, Frank Kafka y Heinrich von Kleist. Los resultados apuntan a una similitud estilística de von Kleist, sobre todo, en la primera obra de Kafka. Además, el método Rolling Delta fue utilizado para explorar un análisis de Juola (2013ª, 2013b) sobre una obra de J. K. Rowling escrita bajo el seudónimo de Robert Galbraith.Este artigo soma-se aos trabalhos disponíveis sobre Processamento de Língua Natural ao fornecer uma demonstração de como linguagens de programação como o R (R CORE TEAM, 2020) podem ser úteis na detecção de autoria e na identificação do estilo do autor em obras literárias. Foram selecionados dois autores e duas obras de cada, a saber: The Adventures of Tom Sawyer (1876) e Adventures of Huckleberry Finn (1884), do autor Mark Twain (1835-1910), e Typee: A Peep at Polynesian Life (1846) e Omoo: A Narrative of Adventures in the South Seas (1847), do autor Herman Melville (1819-1891). Posteriormente, os dados foram analisados seguindo a mesma metodologia de Eder et al. (2016), a fim de testar a eficácia do pacote stylo e aplicar os métodos de Análise de Componentes Principais, Análise de Cluster e Árvore de Consenso. Os resultados apontaram que cada um dos métodos testados conseguiu distinguir as obras dos autores, evidenciando-se, assim, a eficácia do pacote utilizado. Além disso, realiza-se uma análise estilométrica baseada nos métodos de Zeta de Craig e Rolling Delta. Para este último, utilizaram-se obras de dois autores de língua alemã, Frank Kafka e Heinrich von Kleist. Os resultados apontaram uma semelhança estilística de von Kleist, sobretudo, na primeira obra de Kafka. Adicionalmente, o método Rolling Delta foi usado para explorar uma análise feita por Juola (2013ª, 2013b) a respeito de uma obra de J. K. Rowling escrita sob o pseudônimo de Robert Galbraith.Programa de Pós-Graduação em Linguística - UFSC2022-11-23info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionavaliado por paresapplication/pdfhttps://periodicos.ufsc.br/index.php/forum/article/view/7908610.5007/1984-8412.2022.e79086Fórum Linguístico; Vol. 19 No. 3 (2022); 8214-8231Fórum Linguístico; Vol. 19 Núm. 3 (2022); 8214-8231Fórum Linguístico; v. 19 n. 3 (2022); 8214-82311984-84121415-8698reponame:Fórum Linguísticoinstname:Universidade Federal de Santa Catarina (UFSC)instacron:UFSCporhttps://periodicos.ufsc.br/index.php/forum/article/view/79086/51959http://creativecommons.org/licenses/by-nc-nd/4.0info:eu-repo/semantics/openAccessLima e Silva, Luis FilipeSantos Ciríaco, Larissa2022-11-23T20:00:11Zoai:periodicos.ufsc.br:article/79086Revistahttps://periodicos.ufsc.br/index.php/forum/indexPUBhttps://periodicos.ufsc.br/index.php/forum/oaiportaldeperiodicos.bu@contato.ufsc.br || atilio.butturi@ufsc.br1984-84121984-8412opendoar:2023-01-12T16:40:44.077069Fórum Linguístico - Universidade Federal de Santa Catarina (UFSC)false
dc.title.none.fl_str_mv Individuation of Authorship and Style Identification: analysis of Literary works carried with R
Individuación de autoría e identificación de estilo: análisis de obras literárias com R
Individuação de autoria e identificação de estilo: análise de dados linguísticos com auxílio do R
title Individuation of Authorship and Style Identification: analysis of Literary works carried with R
spellingShingle Individuation of Authorship and Style Identification: analysis of Literary works carried with R
Lima e Silva, Luis Filipe
title_short Individuation of Authorship and Style Identification: analysis of Literary works carried with R
title_full Individuation of Authorship and Style Identification: analysis of Literary works carried with R
title_fullStr Individuation of Authorship and Style Identification: analysis of Literary works carried with R
title_full_unstemmed Individuation of Authorship and Style Identification: analysis of Literary works carried with R
title_sort Individuation of Authorship and Style Identification: analysis of Literary works carried with R
author Lima e Silva, Luis Filipe
author_facet Lima e Silva, Luis Filipe
Santos Ciríaco, Larissa
author_role author
author2 Santos Ciríaco, Larissa
author2_role author
dc.contributor.author.fl_str_mv Lima e Silva, Luis Filipe
Santos Ciríaco, Larissa
description This paper adds to the works available on Natural Language Processing by providing a demonstration of how programming languages ​​such as R (R CORE TEAM, 2020) can be useful in detecting authorship and identifying the style of the author in literary works. Two authors and two works each were selected, namely: The Adventures of Tom Sawyer (1876) and Adventures of Huckleberry Finn (1884) by author Mark Twain (1835-1910), and Typee: A Peep at Polynesian Life (1846) and Omoo: A Narrative of Adventures in the South Seas (1847) by author Herman Melville (1819-1891). Subsequently, the data were analyzed following the same methodology as Eder et al. (2016), in order to test the effectiveness of the stylo package and apply the Principal Component Analysis, Cluster Analysis and Consensus Tree methods. The results showed that each of the tested methods was able to distinguish the works of the authors, thus evidencing the effectiveness of the package used. In addition, a stylometric analysis is performed based on Craig's Zeta and Rolling Delta methods. For the latter, works by two German-speaking authors, Frank Kafka and Heinrich von Kleist, were used. The results pointed to a stylistic similarity of von Kleist, especially in Kafka’s first work. Additionally, Rolling Delta was used to explore an analysis carried by Juola (2013a, 2013b) regarding a work by J. K. Rowling written under the pseudonym of Robert Galbraith.
publishDate 2022
dc.date.none.fl_str_mv 2022-11-23
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
avaliado por pares
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://periodicos.ufsc.br/index.php/forum/article/view/79086
10.5007/1984-8412.2022.e79086
url https://periodicos.ufsc.br/index.php/forum/article/view/79086
identifier_str_mv 10.5007/1984-8412.2022.e79086
dc.language.iso.fl_str_mv por
language por
dc.relation.none.fl_str_mv https://periodicos.ufsc.br/index.php/forum/article/view/79086/51959
dc.rights.driver.fl_str_mv http://creativecommons.org/licenses/by-nc-nd/4.0
info:eu-repo/semantics/openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-nc-nd/4.0
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Programa de Pós-Graduação em Linguística - UFSC
publisher.none.fl_str_mv Programa de Pós-Graduação em Linguística - UFSC
dc.source.none.fl_str_mv Fórum Linguístico; Vol. 19 No. 3 (2022); 8214-8231
Fórum Linguístico; Vol. 19 Núm. 3 (2022); 8214-8231
Fórum Linguístico; v. 19 n. 3 (2022); 8214-8231
1984-8412
1415-8698
reponame:Fórum Linguístico
instname:Universidade Federal de Santa Catarina (UFSC)
instacron:UFSC
instname_str Universidade Federal de Santa Catarina (UFSC)
instacron_str UFSC
institution UFSC
reponame_str Fórum Linguístico
collection Fórum Linguístico
repository.name.fl_str_mv Fórum Linguístico - Universidade Federal de Santa Catarina (UFSC)
repository.mail.fl_str_mv portaldeperiodicos.bu@contato.ufsc.br || atilio.butturi@ufsc.br
_version_ 1797051421053419520