Análise exploratória e experimental sobre detecção inteligente de fake news

Detalhes bibliográficos
Autor(a) principal: Silva, Caio Vinícius Meneses
Data de Publicação: 2020
Tipo de documento: Dissertação
Idioma: por
Título da fonte: Repositório Institucional da UFS
Texto Completo: https://ri.ufs.br/jspui/handle/riufs/14136
Resumo: Context: The evolution of the media has contributed to the spread of false news, especially after the emergence of digital social networks. However, this practice is not a recent phenomenon in human history. Reports from the First World War period show the use of misleading advertising by the press, which culminated in new standards of objectivity and journalistic balance. In digital social media, this phenomenon, now called fake news, has found a new environment conducive to spreading worldwide, making it impossible to manually check this immense volume of data. In this context, work in several areas has been carried out in order to try to minimize the damage caused by the proliferation of fake news. Objective: The purpose of this work was to evaluate the effectiveness of the most used methods to check text correspondence, in the task of automatic detection of fake news about the Brazilian presidential elections of 2018, comparing the evidence found with the results obtained from a mapping of the state of art published in this research. Method: Initially, a systematic mapping was carried out to identify and characterize the main approaches, techniques and algorithms used, in computing, to detect false news. Finally, a controlled experiment was carried out, in vitro, using as perspective one of the works found in the literature, whose context has a strong relationship with this study: the American elections of 2016. In this way, the effectiveness of the methods was evaluated, comparing the results and contexts of the two works. Results: For the state of the art, it was identified that the main algorithms used in the task of detecting false news are LSTM (17.14%), Naive-Bayes and Similarity Algorithm (11.43% each). With the execution of the entire experimental process, it was evidenced that the TF-IDF and BM25 methods obtained statistically similar averages of accuracy, respectively, 79.86% and 79.00%. Finally, the Word2Vec and Doc2Vec methods also obtained, respectively, the worst averages, 75.69% and 72.39%. Conclusions: After analyzing the state of the art, gaps related to work in the Big Data context and the need for replication of existing studies, in the form of more controlled experiments, became evident. With the experimental evaluation, it was found that the effectiveness of the methods evaluated were similar to the effectiveness of the work used as a control. In addition, considering the universe of checked news available, the analyzed period and a margin of error of approximately 3.5%, the disclosure of fake news by the followers of both candidates evaluated in the experiment was evidenced. Followers of candidate Jair Bolsonaro (PSL) were responsible for 62.25% of tweets related to fake news, against 37.75% of followers of candidate Fernando Haddad (PT). With regard to accounts deleted from the social network in a short period of time, 59.96% were followers of the PSL candidate and 40.04% of followers of the PT candidate. The dissemination of fake news does not always imply intention, and may only imply greater engagement by some.
id UFS-2_a315852d5f71d4f6a6e91f201e0670ad
oai_identifier_str oai:ufs.br:riufs/14136
network_acronym_str UFS-2
network_name_str Repositório Institucional da UFS
repository_id_str
spelling Silva, Caio Vinícius MenesesRodrigues Júnior, Methanias Colaço2021-04-27T23:34:44Z2021-04-27T23:34:44Z2020-12-08SILVA, Caio Vinícius Meneses. Análise exploratória e experimental sobre detecção inteligente de fake news. 2020. 83f. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal de Sergipe, São Cristóvão, Sergipe, 2020.https://ri.ufs.br/jspui/handle/riufs/14136Autorização para publicação no Repositório da Universidade Federal de Sergipe (RI-UFS), concedida pelo autor.Context: The evolution of the media has contributed to the spread of false news, especially after the emergence of digital social networks. However, this practice is not a recent phenomenon in human history. Reports from the First World War period show the use of misleading advertising by the press, which culminated in new standards of objectivity and journalistic balance. In digital social media, this phenomenon, now called fake news, has found a new environment conducive to spreading worldwide, making it impossible to manually check this immense volume of data. In this context, work in several areas has been carried out in order to try to minimize the damage caused by the proliferation of fake news. Objective: The purpose of this work was to evaluate the effectiveness of the most used methods to check text correspondence, in the task of automatic detection of fake news about the Brazilian presidential elections of 2018, comparing the evidence found with the results obtained from a mapping of the state of art published in this research. Method: Initially, a systematic mapping was carried out to identify and characterize the main approaches, techniques and algorithms used, in computing, to detect false news. Finally, a controlled experiment was carried out, in vitro, using as perspective one of the works found in the literature, whose context has a strong relationship with this study: the American elections of 2016. In this way, the effectiveness of the methods was evaluated, comparing the results and contexts of the two works. Results: For the state of the art, it was identified that the main algorithms used in the task of detecting false news are LSTM (17.14%), Naive-Bayes and Similarity Algorithm (11.43% each). With the execution of the entire experimental process, it was evidenced that the TF-IDF and BM25 methods obtained statistically similar averages of accuracy, respectively, 79.86% and 79.00%. Finally, the Word2Vec and Doc2Vec methods also obtained, respectively, the worst averages, 75.69% and 72.39%. Conclusions: After analyzing the state of the art, gaps related to work in the Big Data context and the need for replication of existing studies, in the form of more controlled experiments, became evident. With the experimental evaluation, it was found that the effectiveness of the methods evaluated were similar to the effectiveness of the work used as a control. In addition, considering the universe of checked news available, the analyzed period and a margin of error of approximately 3.5%, the disclosure of fake news by the followers of both candidates evaluated in the experiment was evidenced. Followers of candidate Jair Bolsonaro (PSL) were responsible for 62.25% of tweets related to fake news, against 37.75% of followers of candidate Fernando Haddad (PT). With regard to accounts deleted from the social network in a short period of time, 59.96% were followers of the PSL candidate and 40.04% of followers of the PT candidate. The dissemination of fake news does not always imply intention, and may only imply greater engagement by some.Contexto: A evolução dos meios de comunicação tem contribuído para a disseminação de notícias falsas, principalmente após o surgimento das redes sociais digitais. No entanto, esta prática não é um fenômeno recente na história da humanidade. Relatos do período da Primeira Guerra Mundial evidenciam o uso de propaganda enganosa por parte da imprensa, que culminou em novas normas de objetividade e equilíbrio jornalístico. Nas mídias sociais digitais, tal fenômeno, agora chamado de fake news, encontrou um novo ambiente propício para se espalhar em escalas mundiais, tornando inviável a checagem manual desse imenso volume de dados. Diante deste contexto, trabalhos em diversas áreas têm sido realizados a fim de tentar minimizar os danos causados pela proliferação das fake news. Objetivo: Este trabalho teve por propósito avaliar a eficácia dos métodos mais utilizados para verificar correspondência de textos, na tarefa de detecção automática de fake news sobre as eleições presidenciais brasileiras de 2018, comparando as evidências encontradas com os resultados obtidos de um mapeamento do estado da arte publicado nesta pesquisa. Método: Inicialmente, foi realizado um mapeamento sistemático para identificar e caracterizar as principais abordagens, técnicas e algoritmos usados, na computação, para a detecção de notícias falsas. Por fim, foi realizado um experimento controlado, in vitro, usando como perspectiva um dos trabalhos encontrados na literatura, cujo contexto possui forte relação com este estudo: as eleições americanas de 2016. Desta forma, avaliou-se a eficácia dos métodos, confrontando os resultados e os contextos dos dois trabalhos. Resultados: Para o estado da arte, foi identificado que os principais algoritmos utilizados na tarefa de detecção de notícias falsas são LSTM (17,14%), Naive-Bayes e Algoritmo de Similaridade (11,43% cada um). Com a execução de todo o processo experimental, foi evidenciado que os métodos TF-IDF e BM25 obtiveram médias estatisticamente similares de acurácia, respectivamente, 79,86% e 79,00%. Por fim, os métodos Word2Vec e Doc2Vec obtiveram resultados um pouco abaixo dos demais, 75,69% e 72,39% respectivamente. Conclusões: Após a análise do estado da arte, evidenciou-se lacunas relacionadas a trabalhos no contexto Big Data e à necessidade de replicações dos estudos existentes, na forma de experimentos mais controlados. Com a avaliação experimental, foi constatado que as eficácias dos métodos avaliados foram similares às eficácias do trabalho utilizado como controle. Além disso, considerando o universo de notícias checadas disponível, o período analisado e uma margem de erro de aproximadamente 3,5%, evidenciou-se a divulgação de fake news da parte de seguidores de ambos os candidatos avaliados no experimento. Os seguidores do candidato Jair Bolsonaro (PSL) foram responsáveis por 62,25% dos tweets relacionados a notícias falsas, contra 37,75% dos seguidores do candidato Fernando Haddad (PT). No que diz respeito às contas excluídas da rede social em um curto espaço de tempo, 59,96% eram de seguidores do candidato do PSL e 40,04% de seguidores do candidato do PT. A divulgação de fake news nem sempre implica intenção, em alguns casos indica apenas um maior engajamento.São Cristóvão, SEporNotícias falsasEleiçõesProcessamento eletrônico de dadosMineração de textoFake newsElectionsText miningCIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAOAnálise exploratória e experimental sobre detecção inteligente de fake newsinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisPós-Graduação em Ciência da ComputaçãoUniversidade Federal de Sergipereponame:Repositório Institucional da UFSinstname:Universidade Federal de Sergipe (UFS)instacron:UFSinfo:eu-repo/semantics/openAccessTEXTCAIO_VINICIUS_MENESES_SILVA.pdf.txtCAIO_VINICIUS_MENESES_SILVA.pdf.txtExtracted texttext/plain174028https://ri.ufs.br/jspui/bitstream/riufs/14136/3/CAIO_VINICIUS_MENESES_SILVA.pdf.txt6f6ee4180a74a5b52cb843474bf7b845MD53THUMBNAILCAIO_VINICIUS_MENESES_SILVA.pdf.jpgCAIO_VINICIUS_MENESES_SILVA.pdf.jpgGenerated Thumbnailimage/jpeg1348https://ri.ufs.br/jspui/bitstream/riufs/14136/4/CAIO_VINICIUS_MENESES_SILVA.pdf.jpg602a7ef86a246d5d00353951a13fd184MD54ORIGINALCAIO_VINICIUS_MENESES_SILVA.pdfCAIO_VINICIUS_MENESES_SILVA.pdfapplication/pdf3635958https://ri.ufs.br/jspui/bitstream/riufs/14136/2/CAIO_VINICIUS_MENESES_SILVA.pdfd857515da4c02950cbfbfd2e517b1e9fMD52LICENSElicense.txtlicense.txttext/plain; charset=utf-81475https://ri.ufs.br/jspui/bitstream/riufs/14136/1/license.txt098cbbf65c2c15e1fb2e49c5d306a44cMD51riufs/141362021-04-27 20:34:47.636oai:ufs.br:riufs/14136TElDRU7Dh0EgREUgRElTVFJJQlVJw4fDg08gTsODTy1FWENMVVNJVkEKCkNvbSBhIGFwcmVzZW50YcOnw6NvIGRlc3RhIGxpY2Vuw6dhLCB2b2PDqiAobyBhdXRvcihlcykgb3UgbyB0aXR1bGFyIGRvcyBkaXJlaXRvcyBkZSBhdXRvcikgY29uY2VkZSDDoCBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkZSBTZXJnaXBlIG8gZGlyZWl0byBuw6NvLWV4Y2x1c2l2byBkZSByZXByb2R1emlyIHNldSB0cmFiYWxobyBubyBmb3JtYXRvIGVsZXRyw7RuaWNvLCBpbmNsdWluZG8gb3MgZm9ybWF0b3Mgw6F1ZGlvIG91IHbDrWRlby4KClZvY8OqIGNvbmNvcmRhIHF1ZSBhIFVuaXZlcnNpZGFkZSBGZWRlcmFsIGRlIFNlcmdpcGUgcG9kZSwgc2VtIGFsdGVyYXIgbyBjb250ZcO6ZG8sIHRyYW5zcG9yIHNldSB0cmFiYWxobyBwYXJhIHF1YWxxdWVyIG1laW8gb3UgZm9ybWF0byBwYXJhIGZpbnMgZGUgcHJlc2VydmHDp8Ojby4KClZvY8OqIHRhbWLDqW0gY29uY29yZGEgcXVlIGEgVW5pdmVyc2lkYWRlIEZlZGVyYWwgZGUgU2VyZ2lwZSBwb2RlIG1hbnRlciBtYWlzIGRlIHVtYSBjw7NwaWEgZGUgc2V1IHRyYWJhbGhvIHBhcmEgZmlucyBkZSBzZWd1cmFuw6dhLCBiYWNrLXVwIGUgcHJlc2VydmHDp8Ojby4KClZvY8OqIGRlY2xhcmEgcXVlIHNldSB0cmFiYWxobyDDqSBvcmlnaW5hbCBlIHF1ZSB2b2PDqiB0ZW0gbyBwb2RlciBkZSBjb25jZWRlciBvcyBkaXJlaXRvcyBjb250aWRvcyBuZXN0YSBsaWNlbsOnYS4gVm9jw6ogdGFtYsOpbSBkZWNsYXJhIHF1ZSBvIGRlcMOzc2l0bywgcXVlIHNlamEgZGUgc2V1IGNvbmhlY2ltZW50bywgbsOjbyBpbmZyaW5nZSBkaXJlaXRvcyBhdXRvcmFpcyBkZSBuaW5ndcOpbS4KCkNhc28gbyB0cmFiYWxobyBjb250ZW5oYSBtYXRlcmlhbCBxdWUgdm9jw6ogbsOjbyBwb3NzdWkgYSB0aXR1bGFyaWRhZGUgZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCB2b2PDqiBkZWNsYXJhIHF1ZSBvYnRldmUgYSBwZXJtaXNzw6NvIGlycmVzdHJpdGEgZG8gZGV0ZW50b3IgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIHBhcmEgY29uY2VkZXIgw6AgVW5pdmVyc2lkYWRlIEZlZGVyYWwgZGUgU2VyZ2lwZSBvcyBkaXJlaXRvcyBhcHJlc2VudGFkb3MgbmVzdGEgbGljZW7Dp2EsIGUgcXVlIGVzc2UgbWF0ZXJpYWwgZGUgcHJvcHJpZWRhZGUgZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUgaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3Ugbm8gY29udGXDumRvLgoKQSBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkZSBTZXJnaXBlIHNlIGNvbXByb21ldGUgYSBpZGVudGlmaWNhciBjbGFyYW1lbnRlIG8gc2V1IG5vbWUocykgb3UgbyhzKSBub21lKHMpIGRvKHMpIApkZXRlbnRvcihlcykgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIGRvIHRyYWJhbGhvLCBlIG7Do28gZmFyw6EgcXVhbHF1ZXIgYWx0ZXJhw6fDo28sIGFsw6ltIGRhcXVlbGFzIGNvbmNlZGlkYXMgcG9yIGVzdGEgbGljZW7Dp2EuIAo=Repositório InstitucionalPUBhttps://ri.ufs.br/oai/requestrepositorio@academico.ufs.bropendoar:2021-04-27T23:34:47Repositório Institucional da UFS - Universidade Federal de Sergipe (UFS)false
dc.title.pt_BR.fl_str_mv Análise exploratória e experimental sobre detecção inteligente de fake news
title Análise exploratória e experimental sobre detecção inteligente de fake news
spellingShingle Análise exploratória e experimental sobre detecção inteligente de fake news
Silva, Caio Vinícius Meneses
Notícias falsas
Eleições
Processamento eletrônico de dados
Mineração de texto
Fake news
Elections
Text mining
CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
title_short Análise exploratória e experimental sobre detecção inteligente de fake news
title_full Análise exploratória e experimental sobre detecção inteligente de fake news
title_fullStr Análise exploratória e experimental sobre detecção inteligente de fake news
title_full_unstemmed Análise exploratória e experimental sobre detecção inteligente de fake news
title_sort Análise exploratória e experimental sobre detecção inteligente de fake news
author Silva, Caio Vinícius Meneses
author_facet Silva, Caio Vinícius Meneses
author_role author
dc.contributor.author.fl_str_mv Silva, Caio Vinícius Meneses
dc.contributor.advisor1.fl_str_mv Rodrigues Júnior, Methanias Colaço
contributor_str_mv Rodrigues Júnior, Methanias Colaço
dc.subject.por.fl_str_mv Notícias falsas
Eleições
Processamento eletrônico de dados
Mineração de texto
topic Notícias falsas
Eleições
Processamento eletrônico de dados
Mineração de texto
Fake news
Elections
Text mining
CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
dc.subject.eng.fl_str_mv Fake news
Elections
Text mining
dc.subject.cnpq.fl_str_mv CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
description Context: The evolution of the media has contributed to the spread of false news, especially after the emergence of digital social networks. However, this practice is not a recent phenomenon in human history. Reports from the First World War period show the use of misleading advertising by the press, which culminated in new standards of objectivity and journalistic balance. In digital social media, this phenomenon, now called fake news, has found a new environment conducive to spreading worldwide, making it impossible to manually check this immense volume of data. In this context, work in several areas has been carried out in order to try to minimize the damage caused by the proliferation of fake news. Objective: The purpose of this work was to evaluate the effectiveness of the most used methods to check text correspondence, in the task of automatic detection of fake news about the Brazilian presidential elections of 2018, comparing the evidence found with the results obtained from a mapping of the state of art published in this research. Method: Initially, a systematic mapping was carried out to identify and characterize the main approaches, techniques and algorithms used, in computing, to detect false news. Finally, a controlled experiment was carried out, in vitro, using as perspective one of the works found in the literature, whose context has a strong relationship with this study: the American elections of 2016. In this way, the effectiveness of the methods was evaluated, comparing the results and contexts of the two works. Results: For the state of the art, it was identified that the main algorithms used in the task of detecting false news are LSTM (17.14%), Naive-Bayes and Similarity Algorithm (11.43% each). With the execution of the entire experimental process, it was evidenced that the TF-IDF and BM25 methods obtained statistically similar averages of accuracy, respectively, 79.86% and 79.00%. Finally, the Word2Vec and Doc2Vec methods also obtained, respectively, the worst averages, 75.69% and 72.39%. Conclusions: After analyzing the state of the art, gaps related to work in the Big Data context and the need for replication of existing studies, in the form of more controlled experiments, became evident. With the experimental evaluation, it was found that the effectiveness of the methods evaluated were similar to the effectiveness of the work used as a control. In addition, considering the universe of checked news available, the analyzed period and a margin of error of approximately 3.5%, the disclosure of fake news by the followers of both candidates evaluated in the experiment was evidenced. Followers of candidate Jair Bolsonaro (PSL) were responsible for 62.25% of tweets related to fake news, against 37.75% of followers of candidate Fernando Haddad (PT). With regard to accounts deleted from the social network in a short period of time, 59.96% were followers of the PSL candidate and 40.04% of followers of the PT candidate. The dissemination of fake news does not always imply intention, and may only imply greater engagement by some.
publishDate 2020
dc.date.issued.fl_str_mv 2020-12-08
dc.date.accessioned.fl_str_mv 2021-04-27T23:34:44Z
dc.date.available.fl_str_mv 2021-04-27T23:34:44Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.citation.fl_str_mv SILVA, Caio Vinícius Meneses. Análise exploratória e experimental sobre detecção inteligente de fake news. 2020. 83f. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal de Sergipe, São Cristóvão, Sergipe, 2020.
dc.identifier.uri.fl_str_mv https://ri.ufs.br/jspui/handle/riufs/14136
dc.identifier.license.pt_BR.fl_str_mv Autorização para publicação no Repositório da Universidade Federal de Sergipe (RI-UFS), concedida pelo autor.
identifier_str_mv SILVA, Caio Vinícius Meneses. Análise exploratória e experimental sobre detecção inteligente de fake news. 2020. 83f. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal de Sergipe, São Cristóvão, Sergipe, 2020.
Autorização para publicação no Repositório da Universidade Federal de Sergipe (RI-UFS), concedida pelo autor.
url https://ri.ufs.br/jspui/handle/riufs/14136
dc.language.iso.fl_str_mv por
language por
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.publisher.program.fl_str_mv Pós-Graduação em Ciência da Computação
dc.publisher.initials.fl_str_mv Universidade Federal de Sergipe
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFS
instname:Universidade Federal de Sergipe (UFS)
instacron:UFS
instname_str Universidade Federal de Sergipe (UFS)
instacron_str UFS
institution UFS
reponame_str Repositório Institucional da UFS
collection Repositório Institucional da UFS
bitstream.url.fl_str_mv https://ri.ufs.br/jspui/bitstream/riufs/14136/3/CAIO_VINICIUS_MENESES_SILVA.pdf.txt
https://ri.ufs.br/jspui/bitstream/riufs/14136/4/CAIO_VINICIUS_MENESES_SILVA.pdf.jpg
https://ri.ufs.br/jspui/bitstream/riufs/14136/2/CAIO_VINICIUS_MENESES_SILVA.pdf
https://ri.ufs.br/jspui/bitstream/riufs/14136/1/license.txt
bitstream.checksum.fl_str_mv 6f6ee4180a74a5b52cb843474bf7b845
602a7ef86a246d5d00353951a13fd184
d857515da4c02950cbfbfd2e517b1e9f
098cbbf65c2c15e1fb2e49c5d306a44c
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFS - Universidade Federal de Sergipe (UFS)
repository.mail.fl_str_mv repositorio@academico.ufs.br
_version_ 1802110695509065728