Análise exploratória e experimental sobre detecção inteligente de fake news
Autor(a) principal: | |
---|---|
Data de Publicação: | 2020 |
Tipo de documento: | Dissertação |
Idioma: | por |
Título da fonte: | Repositório Institucional da UFS |
Texto Completo: | https://ri.ufs.br/jspui/handle/riufs/14136 |
Resumo: | Context: The evolution of the media has contributed to the spread of false news, especially after the emergence of digital social networks. However, this practice is not a recent phenomenon in human history. Reports from the First World War period show the use of misleading advertising by the press, which culminated in new standards of objectivity and journalistic balance. In digital social media, this phenomenon, now called fake news, has found a new environment conducive to spreading worldwide, making it impossible to manually check this immense volume of data. In this context, work in several areas has been carried out in order to try to minimize the damage caused by the proliferation of fake news. Objective: The purpose of this work was to evaluate the effectiveness of the most used methods to check text correspondence, in the task of automatic detection of fake news about the Brazilian presidential elections of 2018, comparing the evidence found with the results obtained from a mapping of the state of art published in this research. Method: Initially, a systematic mapping was carried out to identify and characterize the main approaches, techniques and algorithms used, in computing, to detect false news. Finally, a controlled experiment was carried out, in vitro, using as perspective one of the works found in the literature, whose context has a strong relationship with this study: the American elections of 2016. In this way, the effectiveness of the methods was evaluated, comparing the results and contexts of the two works. Results: For the state of the art, it was identified that the main algorithms used in the task of detecting false news are LSTM (17.14%), Naive-Bayes and Similarity Algorithm (11.43% each). With the execution of the entire experimental process, it was evidenced that the TF-IDF and BM25 methods obtained statistically similar averages of accuracy, respectively, 79.86% and 79.00%. Finally, the Word2Vec and Doc2Vec methods also obtained, respectively, the worst averages, 75.69% and 72.39%. Conclusions: After analyzing the state of the art, gaps related to work in the Big Data context and the need for replication of existing studies, in the form of more controlled experiments, became evident. With the experimental evaluation, it was found that the effectiveness of the methods evaluated were similar to the effectiveness of the work used as a control. In addition, considering the universe of checked news available, the analyzed period and a margin of error of approximately 3.5%, the disclosure of fake news by the followers of both candidates evaluated in the experiment was evidenced. Followers of candidate Jair Bolsonaro (PSL) were responsible for 62.25% of tweets related to fake news, against 37.75% of followers of candidate Fernando Haddad (PT). With regard to accounts deleted from the social network in a short period of time, 59.96% were followers of the PSL candidate and 40.04% of followers of the PT candidate. The dissemination of fake news does not always imply intention, and may only imply greater engagement by some. |
id |
UFS-2_a315852d5f71d4f6a6e91f201e0670ad |
---|---|
oai_identifier_str |
oai:ufs.br:riufs/14136 |
network_acronym_str |
UFS-2 |
network_name_str |
Repositório Institucional da UFS |
repository_id_str |
|
spelling |
Silva, Caio Vinícius MenesesRodrigues Júnior, Methanias Colaço2021-04-27T23:34:44Z2021-04-27T23:34:44Z2020-12-08SILVA, Caio Vinícius Meneses. Análise exploratória e experimental sobre detecção inteligente de fake news. 2020. 83f. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal de Sergipe, São Cristóvão, Sergipe, 2020.https://ri.ufs.br/jspui/handle/riufs/14136Autorização para publicação no Repositório da Universidade Federal de Sergipe (RI-UFS), concedida pelo autor.Context: The evolution of the media has contributed to the spread of false news, especially after the emergence of digital social networks. However, this practice is not a recent phenomenon in human history. Reports from the First World War period show the use of misleading advertising by the press, which culminated in new standards of objectivity and journalistic balance. In digital social media, this phenomenon, now called fake news, has found a new environment conducive to spreading worldwide, making it impossible to manually check this immense volume of data. In this context, work in several areas has been carried out in order to try to minimize the damage caused by the proliferation of fake news. Objective: The purpose of this work was to evaluate the effectiveness of the most used methods to check text correspondence, in the task of automatic detection of fake news about the Brazilian presidential elections of 2018, comparing the evidence found with the results obtained from a mapping of the state of art published in this research. Method: Initially, a systematic mapping was carried out to identify and characterize the main approaches, techniques and algorithms used, in computing, to detect false news. Finally, a controlled experiment was carried out, in vitro, using as perspective one of the works found in the literature, whose context has a strong relationship with this study: the American elections of 2016. In this way, the effectiveness of the methods was evaluated, comparing the results and contexts of the two works. Results: For the state of the art, it was identified that the main algorithms used in the task of detecting false news are LSTM (17.14%), Naive-Bayes and Similarity Algorithm (11.43% each). With the execution of the entire experimental process, it was evidenced that the TF-IDF and BM25 methods obtained statistically similar averages of accuracy, respectively, 79.86% and 79.00%. Finally, the Word2Vec and Doc2Vec methods also obtained, respectively, the worst averages, 75.69% and 72.39%. Conclusions: After analyzing the state of the art, gaps related to work in the Big Data context and the need for replication of existing studies, in the form of more controlled experiments, became evident. With the experimental evaluation, it was found that the effectiveness of the methods evaluated were similar to the effectiveness of the work used as a control. In addition, considering the universe of checked news available, the analyzed period and a margin of error of approximately 3.5%, the disclosure of fake news by the followers of both candidates evaluated in the experiment was evidenced. Followers of candidate Jair Bolsonaro (PSL) were responsible for 62.25% of tweets related to fake news, against 37.75% of followers of candidate Fernando Haddad (PT). With regard to accounts deleted from the social network in a short period of time, 59.96% were followers of the PSL candidate and 40.04% of followers of the PT candidate. The dissemination of fake news does not always imply intention, and may only imply greater engagement by some.Contexto: A evolução dos meios de comunicação tem contribuído para a disseminação de notícias falsas, principalmente após o surgimento das redes sociais digitais. No entanto, esta prática não é um fenômeno recente na história da humanidade. Relatos do período da Primeira Guerra Mundial evidenciam o uso de propaganda enganosa por parte da imprensa, que culminou em novas normas de objetividade e equilíbrio jornalístico. Nas mídias sociais digitais, tal fenômeno, agora chamado de fake news, encontrou um novo ambiente propício para se espalhar em escalas mundiais, tornando inviável a checagem manual desse imenso volume de dados. Diante deste contexto, trabalhos em diversas áreas têm sido realizados a fim de tentar minimizar os danos causados pela proliferação das fake news. Objetivo: Este trabalho teve por propósito avaliar a eficácia dos métodos mais utilizados para verificar correspondência de textos, na tarefa de detecção automática de fake news sobre as eleições presidenciais brasileiras de 2018, comparando as evidências encontradas com os resultados obtidos de um mapeamento do estado da arte publicado nesta pesquisa. Método: Inicialmente, foi realizado um mapeamento sistemático para identificar e caracterizar as principais abordagens, técnicas e algoritmos usados, na computação, para a detecção de notícias falsas. Por fim, foi realizado um experimento controlado, in vitro, usando como perspectiva um dos trabalhos encontrados na literatura, cujo contexto possui forte relação com este estudo: as eleições americanas de 2016. Desta forma, avaliou-se a eficácia dos métodos, confrontando os resultados e os contextos dos dois trabalhos. Resultados: Para o estado da arte, foi identificado que os principais algoritmos utilizados na tarefa de detecção de notícias falsas são LSTM (17,14%), Naive-Bayes e Algoritmo de Similaridade (11,43% cada um). Com a execução de todo o processo experimental, foi evidenciado que os métodos TF-IDF e BM25 obtiveram médias estatisticamente similares de acurácia, respectivamente, 79,86% e 79,00%. Por fim, os métodos Word2Vec e Doc2Vec obtiveram resultados um pouco abaixo dos demais, 75,69% e 72,39% respectivamente. Conclusões: Após a análise do estado da arte, evidenciou-se lacunas relacionadas a trabalhos no contexto Big Data e à necessidade de replicações dos estudos existentes, na forma de experimentos mais controlados. Com a avaliação experimental, foi constatado que as eficácias dos métodos avaliados foram similares às eficácias do trabalho utilizado como controle. Além disso, considerando o universo de notícias checadas disponível, o período analisado e uma margem de erro de aproximadamente 3,5%, evidenciou-se a divulgação de fake news da parte de seguidores de ambos os candidatos avaliados no experimento. Os seguidores do candidato Jair Bolsonaro (PSL) foram responsáveis por 62,25% dos tweets relacionados a notícias falsas, contra 37,75% dos seguidores do candidato Fernando Haddad (PT). No que diz respeito às contas excluídas da rede social em um curto espaço de tempo, 59,96% eram de seguidores do candidato do PSL e 40,04% de seguidores do candidato do PT. A divulgação de fake news nem sempre implica intenção, em alguns casos indica apenas um maior engajamento.São Cristóvão, SEporNotícias falsasEleiçõesProcessamento eletrônico de dadosMineração de textoFake newsElectionsText miningCIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAOAnálise exploratória e experimental sobre detecção inteligente de fake newsinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisPós-Graduação em Ciência da ComputaçãoUniversidade Federal de Sergipereponame:Repositório Institucional da UFSinstname:Universidade Federal de Sergipe (UFS)instacron:UFSinfo:eu-repo/semantics/openAccessTEXTCAIO_VINICIUS_MENESES_SILVA.pdf.txtCAIO_VINICIUS_MENESES_SILVA.pdf.txtExtracted texttext/plain174028https://ri.ufs.br/jspui/bitstream/riufs/14136/3/CAIO_VINICIUS_MENESES_SILVA.pdf.txt6f6ee4180a74a5b52cb843474bf7b845MD53THUMBNAILCAIO_VINICIUS_MENESES_SILVA.pdf.jpgCAIO_VINICIUS_MENESES_SILVA.pdf.jpgGenerated Thumbnailimage/jpeg1348https://ri.ufs.br/jspui/bitstream/riufs/14136/4/CAIO_VINICIUS_MENESES_SILVA.pdf.jpg602a7ef86a246d5d00353951a13fd184MD54ORIGINALCAIO_VINICIUS_MENESES_SILVA.pdfCAIO_VINICIUS_MENESES_SILVA.pdfapplication/pdf3635958https://ri.ufs.br/jspui/bitstream/riufs/14136/2/CAIO_VINICIUS_MENESES_SILVA.pdfd857515da4c02950cbfbfd2e517b1e9fMD52LICENSElicense.txtlicense.txttext/plain; charset=utf-81475https://ri.ufs.br/jspui/bitstream/riufs/14136/1/license.txt098cbbf65c2c15e1fb2e49c5d306a44cMD51riufs/141362021-04-27 20:34:47.636oai:ufs.br:riufs/14136TElDRU7Dh0EgREUgRElTVFJJQlVJw4fDg08gTsODTy1FWENMVVNJVkEKCkNvbSBhIGFwcmVzZW50YcOnw6NvIGRlc3RhIGxpY2Vuw6dhLCB2b2PDqiAobyBhdXRvcihlcykgb3UgbyB0aXR1bGFyIGRvcyBkaXJlaXRvcyBkZSBhdXRvcikgY29uY2VkZSDDoCBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkZSBTZXJnaXBlIG8gZGlyZWl0byBuw6NvLWV4Y2x1c2l2byBkZSByZXByb2R1emlyIHNldSB0cmFiYWxobyBubyBmb3JtYXRvIGVsZXRyw7RuaWNvLCBpbmNsdWluZG8gb3MgZm9ybWF0b3Mgw6F1ZGlvIG91IHbDrWRlby4KClZvY8OqIGNvbmNvcmRhIHF1ZSBhIFVuaXZlcnNpZGFkZSBGZWRlcmFsIGRlIFNlcmdpcGUgcG9kZSwgc2VtIGFsdGVyYXIgbyBjb250ZcO6ZG8sIHRyYW5zcG9yIHNldSB0cmFiYWxobyBwYXJhIHF1YWxxdWVyIG1laW8gb3UgZm9ybWF0byBwYXJhIGZpbnMgZGUgcHJlc2VydmHDp8Ojby4KClZvY8OqIHRhbWLDqW0gY29uY29yZGEgcXVlIGEgVW5pdmVyc2lkYWRlIEZlZGVyYWwgZGUgU2VyZ2lwZSBwb2RlIG1hbnRlciBtYWlzIGRlIHVtYSBjw7NwaWEgZGUgc2V1IHRyYWJhbGhvIHBhcmEgZmlucyBkZSBzZWd1cmFuw6dhLCBiYWNrLXVwIGUgcHJlc2VydmHDp8Ojby4KClZvY8OqIGRlY2xhcmEgcXVlIHNldSB0cmFiYWxobyDDqSBvcmlnaW5hbCBlIHF1ZSB2b2PDqiB0ZW0gbyBwb2RlciBkZSBjb25jZWRlciBvcyBkaXJlaXRvcyBjb250aWRvcyBuZXN0YSBsaWNlbsOnYS4gVm9jw6ogdGFtYsOpbSBkZWNsYXJhIHF1ZSBvIGRlcMOzc2l0bywgcXVlIHNlamEgZGUgc2V1IGNvbmhlY2ltZW50bywgbsOjbyBpbmZyaW5nZSBkaXJlaXRvcyBhdXRvcmFpcyBkZSBuaW5ndcOpbS4KCkNhc28gbyB0cmFiYWxobyBjb250ZW5oYSBtYXRlcmlhbCBxdWUgdm9jw6ogbsOjbyBwb3NzdWkgYSB0aXR1bGFyaWRhZGUgZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCB2b2PDqiBkZWNsYXJhIHF1ZSBvYnRldmUgYSBwZXJtaXNzw6NvIGlycmVzdHJpdGEgZG8gZGV0ZW50b3IgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIHBhcmEgY29uY2VkZXIgw6AgVW5pdmVyc2lkYWRlIEZlZGVyYWwgZGUgU2VyZ2lwZSBvcyBkaXJlaXRvcyBhcHJlc2VudGFkb3MgbmVzdGEgbGljZW7Dp2EsIGUgcXVlIGVzc2UgbWF0ZXJpYWwgZGUgcHJvcHJpZWRhZGUgZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUgaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3Ugbm8gY29udGXDumRvLgoKQSBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkZSBTZXJnaXBlIHNlIGNvbXByb21ldGUgYSBpZGVudGlmaWNhciBjbGFyYW1lbnRlIG8gc2V1IG5vbWUocykgb3UgbyhzKSBub21lKHMpIGRvKHMpIApkZXRlbnRvcihlcykgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIGRvIHRyYWJhbGhvLCBlIG7Do28gZmFyw6EgcXVhbHF1ZXIgYWx0ZXJhw6fDo28sIGFsw6ltIGRhcXVlbGFzIGNvbmNlZGlkYXMgcG9yIGVzdGEgbGljZW7Dp2EuIAo=Repositório InstitucionalPUBhttps://ri.ufs.br/oai/requestrepositorio@academico.ufs.bropendoar:2021-04-27T23:34:47Repositório Institucional da UFS - Universidade Federal de Sergipe (UFS)false |
dc.title.pt_BR.fl_str_mv |
Análise exploratória e experimental sobre detecção inteligente de fake news |
title |
Análise exploratória e experimental sobre detecção inteligente de fake news |
spellingShingle |
Análise exploratória e experimental sobre detecção inteligente de fake news Silva, Caio Vinícius Meneses Notícias falsas Eleições Processamento eletrônico de dados Mineração de texto Fake news Elections Text mining CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO |
title_short |
Análise exploratória e experimental sobre detecção inteligente de fake news |
title_full |
Análise exploratória e experimental sobre detecção inteligente de fake news |
title_fullStr |
Análise exploratória e experimental sobre detecção inteligente de fake news |
title_full_unstemmed |
Análise exploratória e experimental sobre detecção inteligente de fake news |
title_sort |
Análise exploratória e experimental sobre detecção inteligente de fake news |
author |
Silva, Caio Vinícius Meneses |
author_facet |
Silva, Caio Vinícius Meneses |
author_role |
author |
dc.contributor.author.fl_str_mv |
Silva, Caio Vinícius Meneses |
dc.contributor.advisor1.fl_str_mv |
Rodrigues Júnior, Methanias Colaço |
contributor_str_mv |
Rodrigues Júnior, Methanias Colaço |
dc.subject.por.fl_str_mv |
Notícias falsas Eleições Processamento eletrônico de dados Mineração de texto |
topic |
Notícias falsas Eleições Processamento eletrônico de dados Mineração de texto Fake news Elections Text mining CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO |
dc.subject.eng.fl_str_mv |
Fake news Elections Text mining |
dc.subject.cnpq.fl_str_mv |
CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO |
description |
Context: The evolution of the media has contributed to the spread of false news, especially after the emergence of digital social networks. However, this practice is not a recent phenomenon in human history. Reports from the First World War period show the use of misleading advertising by the press, which culminated in new standards of objectivity and journalistic balance. In digital social media, this phenomenon, now called fake news, has found a new environment conducive to spreading worldwide, making it impossible to manually check this immense volume of data. In this context, work in several areas has been carried out in order to try to minimize the damage caused by the proliferation of fake news. Objective: The purpose of this work was to evaluate the effectiveness of the most used methods to check text correspondence, in the task of automatic detection of fake news about the Brazilian presidential elections of 2018, comparing the evidence found with the results obtained from a mapping of the state of art published in this research. Method: Initially, a systematic mapping was carried out to identify and characterize the main approaches, techniques and algorithms used, in computing, to detect false news. Finally, a controlled experiment was carried out, in vitro, using as perspective one of the works found in the literature, whose context has a strong relationship with this study: the American elections of 2016. In this way, the effectiveness of the methods was evaluated, comparing the results and contexts of the two works. Results: For the state of the art, it was identified that the main algorithms used in the task of detecting false news are LSTM (17.14%), Naive-Bayes and Similarity Algorithm (11.43% each). With the execution of the entire experimental process, it was evidenced that the TF-IDF and BM25 methods obtained statistically similar averages of accuracy, respectively, 79.86% and 79.00%. Finally, the Word2Vec and Doc2Vec methods also obtained, respectively, the worst averages, 75.69% and 72.39%. Conclusions: After analyzing the state of the art, gaps related to work in the Big Data context and the need for replication of existing studies, in the form of more controlled experiments, became evident. With the experimental evaluation, it was found that the effectiveness of the methods evaluated were similar to the effectiveness of the work used as a control. In addition, considering the universe of checked news available, the analyzed period and a margin of error of approximately 3.5%, the disclosure of fake news by the followers of both candidates evaluated in the experiment was evidenced. Followers of candidate Jair Bolsonaro (PSL) were responsible for 62.25% of tweets related to fake news, against 37.75% of followers of candidate Fernando Haddad (PT). With regard to accounts deleted from the social network in a short period of time, 59.96% were followers of the PSL candidate and 40.04% of followers of the PT candidate. The dissemination of fake news does not always imply intention, and may only imply greater engagement by some. |
publishDate |
2020 |
dc.date.issued.fl_str_mv |
2020-12-08 |
dc.date.accessioned.fl_str_mv |
2021-04-27T23:34:44Z |
dc.date.available.fl_str_mv |
2021-04-27T23:34:44Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.citation.fl_str_mv |
SILVA, Caio Vinícius Meneses. Análise exploratória e experimental sobre detecção inteligente de fake news. 2020. 83f. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal de Sergipe, São Cristóvão, Sergipe, 2020. |
dc.identifier.uri.fl_str_mv |
https://ri.ufs.br/jspui/handle/riufs/14136 |
dc.identifier.license.pt_BR.fl_str_mv |
Autorização para publicação no Repositório da Universidade Federal de Sergipe (RI-UFS), concedida pelo autor. |
identifier_str_mv |
SILVA, Caio Vinícius Meneses. Análise exploratória e experimental sobre detecção inteligente de fake news. 2020. 83f. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal de Sergipe, São Cristóvão, Sergipe, 2020. Autorização para publicação no Repositório da Universidade Federal de Sergipe (RI-UFS), concedida pelo autor. |
url |
https://ri.ufs.br/jspui/handle/riufs/14136 |
dc.language.iso.fl_str_mv |
por |
language |
por |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.publisher.program.fl_str_mv |
Pós-Graduação em Ciência da Computação |
dc.publisher.initials.fl_str_mv |
Universidade Federal de Sergipe |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFS instname:Universidade Federal de Sergipe (UFS) instacron:UFS |
instname_str |
Universidade Federal de Sergipe (UFS) |
instacron_str |
UFS |
institution |
UFS |
reponame_str |
Repositório Institucional da UFS |
collection |
Repositório Institucional da UFS |
bitstream.url.fl_str_mv |
https://ri.ufs.br/jspui/bitstream/riufs/14136/3/CAIO_VINICIUS_MENESES_SILVA.pdf.txt https://ri.ufs.br/jspui/bitstream/riufs/14136/4/CAIO_VINICIUS_MENESES_SILVA.pdf.jpg https://ri.ufs.br/jspui/bitstream/riufs/14136/2/CAIO_VINICIUS_MENESES_SILVA.pdf https://ri.ufs.br/jspui/bitstream/riufs/14136/1/license.txt |
bitstream.checksum.fl_str_mv |
6f6ee4180a74a5b52cb843474bf7b845 602a7ef86a246d5d00353951a13fd184 d857515da4c02950cbfbfd2e517b1e9f 098cbbf65c2c15e1fb2e49c5d306a44c |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 MD5 |
repository.name.fl_str_mv |
Repositório Institucional da UFS - Universidade Federal de Sergipe (UFS) |
repository.mail.fl_str_mv |
repositorio@academico.ufs.br |
_version_ |
1802110695509065728 |