Aspectos linguísticos na descrição de notícias satíricas do português do Brasil: uma proposta tipológica
Autor(a) principal: | |
---|---|
Data de Publicação: | 2022 |
Tipo de documento: | Tese |
Idioma: | por |
Título da fonte: | Repositório Institucional da UFSCAR |
Texto Completo: | https://repositorio.ufscar.br/handle/ufscar/17089 |
Resumo: | The presence of deception on the web and in messaging applications has been a major contemporary problem. This context generated some initiatives in Linguistics and Computing to linguistically characterize related texts and automatically detect their occurrence. According to (RUBIN; CHEN; CONROY, 2015), there are three traditional types of misleading content: i) fabricated news: produced by what is called the brown press or tabloids; ii) rumors: news disguised to deceive the public and can be released by carelessness by traditional news agencies and iii) satirical news: news similar to real news, however, created for humor purposes. Theoretically, according to Simpson (2003), satire can be defined, based on a triad, as a discursive practice that establishes and results in an ironic incongruity between a satirical target, a satirical author and a satirical audience, and whose purpose is to criticize or mock the satirical target. Thus, if not recognized as humorous content, satirical news can create difficulties in understanding and false beliefs in the minds of more inattentive readers. Automatically detecting satirical news, therefore, proves to be relevant in the linguistic-computational bias, mainly added to the deficiency of works in the literature that consider the computational analysis of satire and the inexistence for the Portuguese language. The construction of a corpus of satirical news and its parallel of true news for Brazilian Portuguese is reported here. The corpus is composed of a subcorpus of 150 satirical news (22,963 words and 1,212 sentences) extracted from the Sensationalista website and another subcorpus of 150 real news (107,133 words and 5,721 sentences) extracted from several online news portals and corresponding to the articles satirical. The total corpus counts 130 thousand words and 6,900 sentences. Furthermore, this work proposes to analyze and describe the morphosyntactics aspects, the difference between the verbal occurrences of satirical news, as well as the main lexical characteristics found in satirical and true articles. To perform this task, the corpus was automatically annotated by the PALAVRAS parser (BICK, 2000). The NILC-Metrix tools (LEAL, 2021) were also used to measure the textual complexity in texts and the LIWC (PENNEBAKER et al., 2015), which evaluates emotional, cognitive and structural components of a given text, is based on the use of a dictionary containing sorting words into categories. Finally, it is expected to contribute to the linguistic description of satirical news and to create, through the results obtained in this research, bases for future Natural Language Processing (NLP) works focused on the automatic identification of misleading content for Brazilian Portuguese. |
id |
SCAR_48a6188b2d16a5f1540ad30b5a072102 |
---|---|
oai_identifier_str |
oai:repositorio.ufscar.br:ufscar/17089 |
network_acronym_str |
SCAR |
network_name_str |
Repositório Institucional da UFSCAR |
repository_id_str |
4322 |
spelling |
Wick-Pedro, GabrielaVale, Oto AraújoValehttp://lattes.cnpq.br/2277403284693571http://lattes.cnpq.br/33674164785277352fe327d5-0c62-4271-84b4-788109a35fad2022-11-29T17:21:02Z2022-11-29T17:21:02Z2022-10-27WICK-PEDRO, Gabriela. Aspectos linguísticos na descrição de notícias satíricas do português do Brasil: uma proposta tipológica. 2022. Tese (Doutorado em Linguística) – Universidade Federal de São Carlos, São Carlos, 2022. Disponível em: https://repositorio.ufscar.br/handle/ufscar/17089.https://repositorio.ufscar.br/handle/ufscar/17089The presence of deception on the web and in messaging applications has been a major contemporary problem. This context generated some initiatives in Linguistics and Computing to linguistically characterize related texts and automatically detect their occurrence. According to (RUBIN; CHEN; CONROY, 2015), there are three traditional types of misleading content: i) fabricated news: produced by what is called the brown press or tabloids; ii) rumors: news disguised to deceive the public and can be released by carelessness by traditional news agencies and iii) satirical news: news similar to real news, however, created for humor purposes. Theoretically, according to Simpson (2003), satire can be defined, based on a triad, as a discursive practice that establishes and results in an ironic incongruity between a satirical target, a satirical author and a satirical audience, and whose purpose is to criticize or mock the satirical target. Thus, if not recognized as humorous content, satirical news can create difficulties in understanding and false beliefs in the minds of more inattentive readers. Automatically detecting satirical news, therefore, proves to be relevant in the linguistic-computational bias, mainly added to the deficiency of works in the literature that consider the computational analysis of satire and the inexistence for the Portuguese language. The construction of a corpus of satirical news and its parallel of true news for Brazilian Portuguese is reported here. The corpus is composed of a subcorpus of 150 satirical news (22,963 words and 1,212 sentences) extracted from the Sensationalista website and another subcorpus of 150 real news (107,133 words and 5,721 sentences) extracted from several online news portals and corresponding to the articles satirical. The total corpus counts 130 thousand words and 6,900 sentences. Furthermore, this work proposes to analyze and describe the morphosyntactics aspects, the difference between the verbal occurrences of satirical news, as well as the main lexical characteristics found in satirical and true articles. To perform this task, the corpus was automatically annotated by the PALAVRAS parser (BICK, 2000). The NILC-Metrix tools (LEAL, 2021) were also used to measure the textual complexity in texts and the LIWC (PENNEBAKER et al., 2015), which evaluates emotional, cognitive and structural components of a given text, is based on the use of a dictionary containing sorting words into categories. Finally, it is expected to contribute to the linguistic description of satirical news and to create, through the results obtained in this research, bases for future Natural Language Processing (NLP) works focused on the automatic identification of misleading content for Brazilian Portuguese.A presença de conteúdo enganoso (do inglês, deception) na web e em aplicativos de mensagens tem se mostrado um grande problema contemporâneo. Esse contexto gerou algumas iniciativas na Linguística e na Computação para caracterizar linguisticamente textos relacionados e detectar automaticamente sua ocorrência. De acordo com (RUBIN; CHEN; CONROY, 2015), existem três tipos tradicionais de conteúdo enganoso: i) notícias fabricadas: produzidas pelo que é chamado de imprensa marrom ou tabloides; ii) boatos: notícias disfarçadas para enganar o público e podem ser divulgadas por descuido pelas agências de notícias tradicionais e iii) notícias satíricas: notícias parecidas com as notícias reais, porém, criadas para fins de humor. Teoricamente, de acordo com Simpson (2003), a sátira pode ser definida, a partir de uma tríade, como uma prática discursiva que estabelece e resulta uma incongruência irônica entre um alvo satírico, um autor satírico e um público satírico e tem como propósito criticar ou zombar do alvo satírico. Assim, se não reconhecidas como um conteúdo de humor, as notícias satíricas podem criar dificuldades de entendimento e falsas crenças nas mentes de leitores mais desatentos. Detectar uma notícia satírica automaticamente, portanto, mostra-se relevante no viés linguístico-computacional, principalmente somado à deficiência de trabalhos na literatura que consideram a análise computacional da sátira e a inexistência para a Língua Portuguesa. Relata-se aqui a construção de um corpus de notícias satíricas e seu paralelo de notícias verdadeiras para português brasileiro. O corpus é composto por um subcorpus de 150 notícias satíricas (22.963 palavras e 1.212 sentenças) extraídas do site Sensacionalista e outro subcorpus de 150 notícias verdadeiras (107.133 palavras e 5.721 sentenças) extraídas de diversos portais on-line de notícias e são correspondentes aos artigos satíricos. O corpus total contabiliza 130 mil palavras e 6.900 sentenças. Além disso, este trabalho se propõe a analisar e descrever os aspectos morfossintáticos, a diferença das ocorrências verbais das notícias satíricas, bem como as principais características lexicais encontradas nos artigos satíricos e verdadeiros. Para a realização desta tarefa, o corpus foi anotado automaticamente pelo parser PALAVRAS (BICK, 2000). Também foram utilizadas as ferramentas NILC-Metrix (LEAL, 2021) para medir a complexidade textual nos textos e o LIWC (PENNEBAKER et al., 2015), que avalia componentes emocionais, cognitivos e estruturais de um determinado texto, baseia-se na utilização de um dicionário contendo classificação de palavras em categorias. Finalmente, espera-se contribuir na descrição linguística de notícias satíricas e criar por meio dos resultados obtidos nesta pesquisa, bases para futuros trabalhos do Processamento de Língua Natural (PLN) focados na identificação automática de conteúdo enganoso para o português do Brasil.Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)88882.426864/2019- 01porUniversidade Federal de São CarlosCâmpus São CarlosPrograma de Pós-Graduação em Linguística - PPGLUFSCarAttribution-NonCommercial-NoDerivs 3.0 Brazilhttp://creativecommons.org/licenses/by-nc-nd/3.0/br/info:eu-repo/semantics/openAccessNotícia satíricaSátiraNotícia falsaPistas linguísticasCorpusSatirical newsSatireFake newsLinguistic cluesCorpusLINGUISTICA, LETRAS E ARTES::LINGUISTICA::TEORIA E ANALISE LINGUISTICAAspectos linguísticos na descrição de notícias satíricas do português do Brasil: uma proposta tipológicaLinguistic aspects in the description of satirical news in Brazilian Portuguese: a typological proposalinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesis60060060085b6c37a-aa0c-4ee5-9acc-50f6bc172f4fe2197b6e-20ee-459c-b96e-a164afd732f7reponame:Repositório Institucional da UFSCARinstname:Universidade Federal de São Carlos (UFSCAR)instacron:UFSCARORIGINALTese Final.pdfTese Final.pdfTese Finalapplication/pdf5440819https://repositorio.ufscar.br/bitstream/ufscar/17089/1/Tese%20Final.pdf9d82f2119579ed553caf9b3140f13c35MD51Carta Orientador.pdfCarta Orientador.pdfCarta orientadorapplication/pdf139165https://repositorio.ufscar.br/bitstream/ufscar/17089/2/Carta%20Orientador.pdf3efabefd5a53ae664840429f38b299b2MD52CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8811https://repositorio.ufscar.br/bitstream/ufscar/17089/3/license_rdfe39d27027a6cc9cb039ad269a5db8e34MD53TEXTTese Final.pdf.txtTese Final.pdf.txtExtracted texttext/plain366687https://repositorio.ufscar.br/bitstream/ufscar/17089/4/Tese%20Final.pdf.txtcc32b15ccbdf3e4791116b99ba365ff1MD54Carta Orientador.pdf.txtCarta Orientador.pdf.txtExtracted texttext/plain1214https://repositorio.ufscar.br/bitstream/ufscar/17089/6/Carta%20Orientador.pdf.txt488a0f1c45b5081ba853cbe889c1aceaMD56THUMBNAILTese Final.pdf.jpgTese Final.pdf.jpgIM Thumbnailimage/jpeg10324https://repositorio.ufscar.br/bitstream/ufscar/17089/5/Tese%20Final.pdf.jpg13b8ccd01d559e19006fb9453b655ec6MD55Carta Orientador.pdf.jpgCarta Orientador.pdf.jpgIM Thumbnailimage/jpeg6409https://repositorio.ufscar.br/bitstream/ufscar/17089/7/Carta%20Orientador.pdf.jpge1946a3379e943ad39fd63e233ea8051MD57ufscar/170892023-09-18 18:32:30.671oai:repositorio.ufscar.br:ufscar/17089Repositório InstitucionalPUBhttps://repositorio.ufscar.br/oai/requestopendoar:43222023-09-18T18:32:30Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)false |
dc.title.por.fl_str_mv |
Aspectos linguísticos na descrição de notícias satíricas do português do Brasil: uma proposta tipológica |
dc.title.alternative.eng.fl_str_mv |
Linguistic aspects in the description of satirical news in Brazilian Portuguese: a typological proposal |
title |
Aspectos linguísticos na descrição de notícias satíricas do português do Brasil: uma proposta tipológica |
spellingShingle |
Aspectos linguísticos na descrição de notícias satíricas do português do Brasil: uma proposta tipológica Wick-Pedro, Gabriela Notícia satírica Sátira Notícia falsa Pistas linguísticas Corpus Satirical news Satire Fake news Linguistic clues Corpus LINGUISTICA, LETRAS E ARTES::LINGUISTICA::TEORIA E ANALISE LINGUISTICA |
title_short |
Aspectos linguísticos na descrição de notícias satíricas do português do Brasil: uma proposta tipológica |
title_full |
Aspectos linguísticos na descrição de notícias satíricas do português do Brasil: uma proposta tipológica |
title_fullStr |
Aspectos linguísticos na descrição de notícias satíricas do português do Brasil: uma proposta tipológica |
title_full_unstemmed |
Aspectos linguísticos na descrição de notícias satíricas do português do Brasil: uma proposta tipológica |
title_sort |
Aspectos linguísticos na descrição de notícias satíricas do português do Brasil: uma proposta tipológica |
author |
Wick-Pedro, Gabriela |
author_facet |
Wick-Pedro, Gabriela |
author_role |
author |
dc.contributor.authorlattes.por.fl_str_mv |
http://lattes.cnpq.br/3367416478527735 |
dc.contributor.author.fl_str_mv |
Wick-Pedro, Gabriela |
dc.contributor.advisor1.fl_str_mv |
Vale, Oto Araújo Vale |
dc.contributor.advisor1Lattes.fl_str_mv |
http://lattes.cnpq.br/2277403284693571 |
dc.contributor.authorID.fl_str_mv |
2fe327d5-0c62-4271-84b4-788109a35fad |
contributor_str_mv |
Vale, Oto Araújo Vale |
dc.subject.por.fl_str_mv |
Notícia satírica Sátira Notícia falsa Pistas linguísticas Corpus |
topic |
Notícia satírica Sátira Notícia falsa Pistas linguísticas Corpus Satirical news Satire Fake news Linguistic clues Corpus LINGUISTICA, LETRAS E ARTES::LINGUISTICA::TEORIA E ANALISE LINGUISTICA |
dc.subject.eng.fl_str_mv |
Satirical news Satire Fake news Linguistic clues Corpus |
dc.subject.cnpq.fl_str_mv |
LINGUISTICA, LETRAS E ARTES::LINGUISTICA::TEORIA E ANALISE LINGUISTICA |
description |
The presence of deception on the web and in messaging applications has been a major contemporary problem. This context generated some initiatives in Linguistics and Computing to linguistically characterize related texts and automatically detect their occurrence. According to (RUBIN; CHEN; CONROY, 2015), there are three traditional types of misleading content: i) fabricated news: produced by what is called the brown press or tabloids; ii) rumors: news disguised to deceive the public and can be released by carelessness by traditional news agencies and iii) satirical news: news similar to real news, however, created for humor purposes. Theoretically, according to Simpson (2003), satire can be defined, based on a triad, as a discursive practice that establishes and results in an ironic incongruity between a satirical target, a satirical author and a satirical audience, and whose purpose is to criticize or mock the satirical target. Thus, if not recognized as humorous content, satirical news can create difficulties in understanding and false beliefs in the minds of more inattentive readers. Automatically detecting satirical news, therefore, proves to be relevant in the linguistic-computational bias, mainly added to the deficiency of works in the literature that consider the computational analysis of satire and the inexistence for the Portuguese language. The construction of a corpus of satirical news and its parallel of true news for Brazilian Portuguese is reported here. The corpus is composed of a subcorpus of 150 satirical news (22,963 words and 1,212 sentences) extracted from the Sensationalista website and another subcorpus of 150 real news (107,133 words and 5,721 sentences) extracted from several online news portals and corresponding to the articles satirical. The total corpus counts 130 thousand words and 6,900 sentences. Furthermore, this work proposes to analyze and describe the morphosyntactics aspects, the difference between the verbal occurrences of satirical news, as well as the main lexical characteristics found in satirical and true articles. To perform this task, the corpus was automatically annotated by the PALAVRAS parser (BICK, 2000). The NILC-Metrix tools (LEAL, 2021) were also used to measure the textual complexity in texts and the LIWC (PENNEBAKER et al., 2015), which evaluates emotional, cognitive and structural components of a given text, is based on the use of a dictionary containing sorting words into categories. Finally, it is expected to contribute to the linguistic description of satirical news and to create, through the results obtained in this research, bases for future Natural Language Processing (NLP) works focused on the automatic identification of misleading content for Brazilian Portuguese. |
publishDate |
2022 |
dc.date.accessioned.fl_str_mv |
2022-11-29T17:21:02Z |
dc.date.available.fl_str_mv |
2022-11-29T17:21:02Z |
dc.date.issued.fl_str_mv |
2022-10-27 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/doctoralThesis |
format |
doctoralThesis |
status_str |
publishedVersion |
dc.identifier.citation.fl_str_mv |
WICK-PEDRO, Gabriela. Aspectos linguísticos na descrição de notícias satíricas do português do Brasil: uma proposta tipológica. 2022. Tese (Doutorado em Linguística) – Universidade Federal de São Carlos, São Carlos, 2022. Disponível em: https://repositorio.ufscar.br/handle/ufscar/17089. |
dc.identifier.uri.fl_str_mv |
https://repositorio.ufscar.br/handle/ufscar/17089 |
identifier_str_mv |
WICK-PEDRO, Gabriela. Aspectos linguísticos na descrição de notícias satíricas do português do Brasil: uma proposta tipológica. 2022. Tese (Doutorado em Linguística) – Universidade Federal de São Carlos, São Carlos, 2022. Disponível em: https://repositorio.ufscar.br/handle/ufscar/17089. |
url |
https://repositorio.ufscar.br/handle/ufscar/17089 |
dc.language.iso.fl_str_mv |
por |
language |
por |
dc.relation.confidence.fl_str_mv |
600 600 600 |
dc.relation.authority.fl_str_mv |
85b6c37a-aa0c-4ee5-9acc-50f6bc172f4f e2197b6e-20ee-459c-b96e-a164afd732f7 |
dc.rights.driver.fl_str_mv |
Attribution-NonCommercial-NoDerivs 3.0 Brazil http://creativecommons.org/licenses/by-nc-nd/3.0/br/ info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Attribution-NonCommercial-NoDerivs 3.0 Brazil http://creativecommons.org/licenses/by-nc-nd/3.0/br/ |
eu_rights_str_mv |
openAccess |
dc.publisher.none.fl_str_mv |
Universidade Federal de São Carlos Câmpus São Carlos |
dc.publisher.program.fl_str_mv |
Programa de Pós-Graduação em Linguística - PPGL |
dc.publisher.initials.fl_str_mv |
UFSCar |
publisher.none.fl_str_mv |
Universidade Federal de São Carlos Câmpus São Carlos |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFSCAR instname:Universidade Federal de São Carlos (UFSCAR) instacron:UFSCAR |
instname_str |
Universidade Federal de São Carlos (UFSCAR) |
instacron_str |
UFSCAR |
institution |
UFSCAR |
reponame_str |
Repositório Institucional da UFSCAR |
collection |
Repositório Institucional da UFSCAR |
bitstream.url.fl_str_mv |
https://repositorio.ufscar.br/bitstream/ufscar/17089/1/Tese%20Final.pdf https://repositorio.ufscar.br/bitstream/ufscar/17089/2/Carta%20Orientador.pdf https://repositorio.ufscar.br/bitstream/ufscar/17089/3/license_rdf https://repositorio.ufscar.br/bitstream/ufscar/17089/4/Tese%20Final.pdf.txt https://repositorio.ufscar.br/bitstream/ufscar/17089/6/Carta%20Orientador.pdf.txt https://repositorio.ufscar.br/bitstream/ufscar/17089/5/Tese%20Final.pdf.jpg https://repositorio.ufscar.br/bitstream/ufscar/17089/7/Carta%20Orientador.pdf.jpg |
bitstream.checksum.fl_str_mv |
9d82f2119579ed553caf9b3140f13c35 3efabefd5a53ae664840429f38b299b2 e39d27027a6cc9cb039ad269a5db8e34 cc32b15ccbdf3e4791116b99ba365ff1 488a0f1c45b5081ba853cbe889c1acea 13b8ccd01d559e19006fb9453b655ec6 e1946a3379e943ad39fd63e233ea8051 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 MD5 MD5 MD5 MD5 |
repository.name.fl_str_mv |
Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR) |
repository.mail.fl_str_mv |
|
_version_ |
1802136415166791680 |