GOTCHA BOT DETECTION: CONTEXT, TIME AND PLACE MATTERS

Detalhes bibliográficos
Autor(a) principal: Santini, Rose Marie
Data de Publicação: 2023
Outros Autores: Salles, Débora, Ferreira, Fernando, Grael, Felipe
Tipo de documento: preprint
Idioma: eng
Título da fonte: SciELO Preprints
Texto Completo: https://preprints.scielo.org/index.php/scielo/preprint/view/5974
Resumo: Bot detection is increasingly relevant considering that automated accounts play a disproportionate role in spreading disinformation, controlling social interactions, influencing social media algorithms and manufacturing public opinion online for different purposes. Definition, description and detection of automated manipulation techniques have proved a challenge as technology quickly advances in reach and sophistication. Considering the high contextual character of social science research, the employment of off-the-shelf detection tools raises questions regarding the applicability of machine learning systems in different cases, times and places. Thus, our purpose is to discuss the role of computational methods focusing on understanding the limitations and potential of machine learning systems to identify bots on social media platforms. To address it, we analyze the performance of Botometer, a widely adopted detection tool, in a specific domain (Amazon Forest Fires) and language (Portuguese) and propose a supervised machine learning classifier, called Gotcha, based on Botometer's framework and trained for this specific dataset. We also question how our classifier behaves and evolves over time and perform tests to evaluate the generalization capabilities of the retrained model. Our results demonstrated that supervised methods do not perform well with datasets that present features on which the system was not directly trained, such as language and topic. Hence, our study shows that a successful computational model does not always guarantee reliable results, applicable to a specific real case. Our findings indicate the need for social scientists to confirm the reliability of different tools created and tested only through the prism of computational studies before applying them to empirical social science research.
id SCI-1_fb05c1bef9ee22c78187306c405cdc1a
oai_identifier_str oai:ops.preprints.scielo.org:preprint/5974
network_acronym_str SCI-1
network_name_str SciELO Preprints
repository_id_str
spelling GOTCHA BOT DETECTION: CONTEXT, TIME AND PLACE MATTERSBot detectionmachine learning algorithmBrazilcomputational propagandaBot detection is increasingly relevant considering that automated accounts play a disproportionate role in spreading disinformation, controlling social interactions, influencing social media algorithms and manufacturing public opinion online for different purposes. Definition, description and detection of automated manipulation techniques have proved a challenge as technology quickly advances in reach and sophistication. Considering the high contextual character of social science research, the employment of off-the-shelf detection tools raises questions regarding the applicability of machine learning systems in different cases, times and places. Thus, our purpose is to discuss the role of computational methods focusing on understanding the limitations and potential of machine learning systems to identify bots on social media platforms. To address it, we analyze the performance of Botometer, a widely adopted detection tool, in a specific domain (Amazon Forest Fires) and language (Portuguese) and propose a supervised machine learning classifier, called Gotcha, based on Botometer's framework and trained for this specific dataset. We also question how our classifier behaves and evolves over time and perform tests to evaluate the generalization capabilities of the retrained model. Our results demonstrated that supervised methods do not perform well with datasets that present features on which the system was not directly trained, such as language and topic. Hence, our study shows that a successful computational model does not always guarantee reliable results, applicable to a specific real case. Our findings indicate the need for social scientists to confirm the reliability of different tools created and tested only through the prism of computational studies before applying them to empirical social science research.SciELO PreprintsSciELO PreprintsSciELO Preprints2023-05-05info:eu-repo/semantics/preprintinfo:eu-repo/semantics/publishedVersionapplication/pdfhttps://preprints.scielo.org/index.php/scielo/preprint/view/597410.1590/SciELOPreprints.5974enghttps://preprints.scielo.org/index.php/scielo/article/view/5974/11503Copyright (c) 2023 Rose Marie Santini, Débora Salles, Fernando Ferreira, Felipe Graelhttps://creativecommons.org/licenses/by/4.0info:eu-repo/semantics/openAccessSantini, Rose MarieSalles, DéboraFerreira, FernandoGrael, Felipereponame:SciELO Preprintsinstname:Scientific Electronic Library Online (SCIELO)instacron:SCI2023-04-28T12:49:41Zoai:ops.preprints.scielo.org:preprint/5974Servidor de preprintshttps://preprints.scielo.org/index.php/scieloONGhttps://preprints.scielo.org/index.php/scielo/oaiscielo.submission@scielo.orgopendoar:2023-04-28T12:49:41SciELO Preprints - Scientific Electronic Library Online (SCIELO)false
dc.title.none.fl_str_mv GOTCHA BOT DETECTION: CONTEXT, TIME AND PLACE MATTERS
title GOTCHA BOT DETECTION: CONTEXT, TIME AND PLACE MATTERS
spellingShingle GOTCHA BOT DETECTION: CONTEXT, TIME AND PLACE MATTERS
Santini, Rose Marie
Bot detection
machine learning algorithm
Brazil
computational propaganda
title_short GOTCHA BOT DETECTION: CONTEXT, TIME AND PLACE MATTERS
title_full GOTCHA BOT DETECTION: CONTEXT, TIME AND PLACE MATTERS
title_fullStr GOTCHA BOT DETECTION: CONTEXT, TIME AND PLACE MATTERS
title_full_unstemmed GOTCHA BOT DETECTION: CONTEXT, TIME AND PLACE MATTERS
title_sort GOTCHA BOT DETECTION: CONTEXT, TIME AND PLACE MATTERS
author Santini, Rose Marie
author_facet Santini, Rose Marie
Salles, Débora
Ferreira, Fernando
Grael, Felipe
author_role author
author2 Salles, Débora
Ferreira, Fernando
Grael, Felipe
author2_role author
author
author
dc.contributor.author.fl_str_mv Santini, Rose Marie
Salles, Débora
Ferreira, Fernando
Grael, Felipe
dc.subject.por.fl_str_mv Bot detection
machine learning algorithm
Brazil
computational propaganda
topic Bot detection
machine learning algorithm
Brazil
computational propaganda
description Bot detection is increasingly relevant considering that automated accounts play a disproportionate role in spreading disinformation, controlling social interactions, influencing social media algorithms and manufacturing public opinion online for different purposes. Definition, description and detection of automated manipulation techniques have proved a challenge as technology quickly advances in reach and sophistication. Considering the high contextual character of social science research, the employment of off-the-shelf detection tools raises questions regarding the applicability of machine learning systems in different cases, times and places. Thus, our purpose is to discuss the role of computational methods focusing on understanding the limitations and potential of machine learning systems to identify bots on social media platforms. To address it, we analyze the performance of Botometer, a widely adopted detection tool, in a specific domain (Amazon Forest Fires) and language (Portuguese) and propose a supervised machine learning classifier, called Gotcha, based on Botometer's framework and trained for this specific dataset. We also question how our classifier behaves and evolves over time and perform tests to evaluate the generalization capabilities of the retrained model. Our results demonstrated that supervised methods do not perform well with datasets that present features on which the system was not directly trained, such as language and topic. Hence, our study shows that a successful computational model does not always guarantee reliable results, applicable to a specific real case. Our findings indicate the need for social scientists to confirm the reliability of different tools created and tested only through the prism of computational studies before applying them to empirical social science research.
publishDate 2023
dc.date.none.fl_str_mv 2023-05-05
dc.type.driver.fl_str_mv info:eu-repo/semantics/preprint
info:eu-repo/semantics/publishedVersion
format preprint
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://preprints.scielo.org/index.php/scielo/preprint/view/5974
10.1590/SciELOPreprints.5974
url https://preprints.scielo.org/index.php/scielo/preprint/view/5974
identifier_str_mv 10.1590/SciELOPreprints.5974
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv https://preprints.scielo.org/index.php/scielo/article/view/5974/11503
dc.rights.driver.fl_str_mv Copyright (c) 2023 Rose Marie Santini, Débora Salles, Fernando Ferreira, Felipe Grael
https://creativecommons.org/licenses/by/4.0
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Copyright (c) 2023 Rose Marie Santini, Débora Salles, Fernando Ferreira, Felipe Grael
https://creativecommons.org/licenses/by/4.0
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv SciELO Preprints
SciELO Preprints
SciELO Preprints
publisher.none.fl_str_mv SciELO Preprints
SciELO Preprints
SciELO Preprints
dc.source.none.fl_str_mv reponame:SciELO Preprints
instname:Scientific Electronic Library Online (SCIELO)
instacron:SCI
instname_str Scientific Electronic Library Online (SCIELO)
instacron_str SCI
institution SCI
reponame_str SciELO Preprints
collection SciELO Preprints
repository.name.fl_str_mv SciELO Preprints - Scientific Electronic Library Online (SCIELO)
repository.mail.fl_str_mv scielo.submission@scielo.org
_version_ 1797047811629383680