GOTCHA BOT DETECTION: CONTEXT, TIME AND PLACE MATTERS
Autor(a) principal: | |
---|---|
Data de Publicação: | 2023 |
Outros Autores: | , , |
Tipo de documento: | preprint |
Idioma: | eng |
Título da fonte: | SciELO Preprints |
Texto Completo: | https://preprints.scielo.org/index.php/scielo/preprint/view/5974 |
Resumo: | Bot detection is increasingly relevant considering that automated accounts play a disproportionate role in spreading disinformation, controlling social interactions, influencing social media algorithms and manufacturing public opinion online for different purposes. Definition, description and detection of automated manipulation techniques have proved a challenge as technology quickly advances in reach and sophistication. Considering the high contextual character of social science research, the employment of off-the-shelf detection tools raises questions regarding the applicability of machine learning systems in different cases, times and places. Thus, our purpose is to discuss the role of computational methods focusing on understanding the limitations and potential of machine learning systems to identify bots on social media platforms. To address it, we analyze the performance of Botometer, a widely adopted detection tool, in a specific domain (Amazon Forest Fires) and language (Portuguese) and propose a supervised machine learning classifier, called Gotcha, based on Botometer's framework and trained for this specific dataset. We also question how our classifier behaves and evolves over time and perform tests to evaluate the generalization capabilities of the retrained model. Our results demonstrated that supervised methods do not perform well with datasets that present features on which the system was not directly trained, such as language and topic. Hence, our study shows that a successful computational model does not always guarantee reliable results, applicable to a specific real case. Our findings indicate the need for social scientists to confirm the reliability of different tools created and tested only through the prism of computational studies before applying them to empirical social science research. |
id |
SCI-1_fb05c1bef9ee22c78187306c405cdc1a |
---|---|
oai_identifier_str |
oai:ops.preprints.scielo.org:preprint/5974 |
network_acronym_str |
SCI-1 |
network_name_str |
SciELO Preprints |
repository_id_str |
|
spelling |
GOTCHA BOT DETECTION: CONTEXT, TIME AND PLACE MATTERSBot detectionmachine learning algorithmBrazilcomputational propagandaBot detection is increasingly relevant considering that automated accounts play a disproportionate role in spreading disinformation, controlling social interactions, influencing social media algorithms and manufacturing public opinion online for different purposes. Definition, description and detection of automated manipulation techniques have proved a challenge as technology quickly advances in reach and sophistication. Considering the high contextual character of social science research, the employment of off-the-shelf detection tools raises questions regarding the applicability of machine learning systems in different cases, times and places. Thus, our purpose is to discuss the role of computational methods focusing on understanding the limitations and potential of machine learning systems to identify bots on social media platforms. To address it, we analyze the performance of Botometer, a widely adopted detection tool, in a specific domain (Amazon Forest Fires) and language (Portuguese) and propose a supervised machine learning classifier, called Gotcha, based on Botometer's framework and trained for this specific dataset. We also question how our classifier behaves and evolves over time and perform tests to evaluate the generalization capabilities of the retrained model. Our results demonstrated that supervised methods do not perform well with datasets that present features on which the system was not directly trained, such as language and topic. Hence, our study shows that a successful computational model does not always guarantee reliable results, applicable to a specific real case. Our findings indicate the need for social scientists to confirm the reliability of different tools created and tested only through the prism of computational studies before applying them to empirical social science research.SciELO PreprintsSciELO PreprintsSciELO Preprints2023-05-05info:eu-repo/semantics/preprintinfo:eu-repo/semantics/publishedVersionapplication/pdfhttps://preprints.scielo.org/index.php/scielo/preprint/view/597410.1590/SciELOPreprints.5974enghttps://preprints.scielo.org/index.php/scielo/article/view/5974/11503Copyright (c) 2023 Rose Marie Santini, Débora Salles, Fernando Ferreira, Felipe Graelhttps://creativecommons.org/licenses/by/4.0info:eu-repo/semantics/openAccessSantini, Rose MarieSalles, DéboraFerreira, FernandoGrael, Felipereponame:SciELO Preprintsinstname:Scientific Electronic Library Online (SCIELO)instacron:SCI2023-04-28T12:49:41Zoai:ops.preprints.scielo.org:preprint/5974Servidor de preprintshttps://preprints.scielo.org/index.php/scieloONGhttps://preprints.scielo.org/index.php/scielo/oaiscielo.submission@scielo.orgopendoar:2023-04-28T12:49:41SciELO Preprints - Scientific Electronic Library Online (SCIELO)false |
dc.title.none.fl_str_mv |
GOTCHA BOT DETECTION: CONTEXT, TIME AND PLACE MATTERS |
title |
GOTCHA BOT DETECTION: CONTEXT, TIME AND PLACE MATTERS |
spellingShingle |
GOTCHA BOT DETECTION: CONTEXT, TIME AND PLACE MATTERS Santini, Rose Marie Bot detection machine learning algorithm Brazil computational propaganda |
title_short |
GOTCHA BOT DETECTION: CONTEXT, TIME AND PLACE MATTERS |
title_full |
GOTCHA BOT DETECTION: CONTEXT, TIME AND PLACE MATTERS |
title_fullStr |
GOTCHA BOT DETECTION: CONTEXT, TIME AND PLACE MATTERS |
title_full_unstemmed |
GOTCHA BOT DETECTION: CONTEXT, TIME AND PLACE MATTERS |
title_sort |
GOTCHA BOT DETECTION: CONTEXT, TIME AND PLACE MATTERS |
author |
Santini, Rose Marie |
author_facet |
Santini, Rose Marie Salles, Débora Ferreira, Fernando Grael, Felipe |
author_role |
author |
author2 |
Salles, Débora Ferreira, Fernando Grael, Felipe |
author2_role |
author author author |
dc.contributor.author.fl_str_mv |
Santini, Rose Marie Salles, Débora Ferreira, Fernando Grael, Felipe |
dc.subject.por.fl_str_mv |
Bot detection machine learning algorithm Brazil computational propaganda |
topic |
Bot detection machine learning algorithm Brazil computational propaganda |
description |
Bot detection is increasingly relevant considering that automated accounts play a disproportionate role in spreading disinformation, controlling social interactions, influencing social media algorithms and manufacturing public opinion online for different purposes. Definition, description and detection of automated manipulation techniques have proved a challenge as technology quickly advances in reach and sophistication. Considering the high contextual character of social science research, the employment of off-the-shelf detection tools raises questions regarding the applicability of machine learning systems in different cases, times and places. Thus, our purpose is to discuss the role of computational methods focusing on understanding the limitations and potential of machine learning systems to identify bots on social media platforms. To address it, we analyze the performance of Botometer, a widely adopted detection tool, in a specific domain (Amazon Forest Fires) and language (Portuguese) and propose a supervised machine learning classifier, called Gotcha, based on Botometer's framework and trained for this specific dataset. We also question how our classifier behaves and evolves over time and perform tests to evaluate the generalization capabilities of the retrained model. Our results demonstrated that supervised methods do not perform well with datasets that present features on which the system was not directly trained, such as language and topic. Hence, our study shows that a successful computational model does not always guarantee reliable results, applicable to a specific real case. Our findings indicate the need for social scientists to confirm the reliability of different tools created and tested only through the prism of computational studies before applying them to empirical social science research. |
publishDate |
2023 |
dc.date.none.fl_str_mv |
2023-05-05 |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/preprint info:eu-repo/semantics/publishedVersion |
format |
preprint |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://preprints.scielo.org/index.php/scielo/preprint/view/5974 10.1590/SciELOPreprints.5974 |
url |
https://preprints.scielo.org/index.php/scielo/preprint/view/5974 |
identifier_str_mv |
10.1590/SciELOPreprints.5974 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
https://preprints.scielo.org/index.php/scielo/article/view/5974/11503 |
dc.rights.driver.fl_str_mv |
Copyright (c) 2023 Rose Marie Santini, Débora Salles, Fernando Ferreira, Felipe Grael https://creativecommons.org/licenses/by/4.0 info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Copyright (c) 2023 Rose Marie Santini, Débora Salles, Fernando Ferreira, Felipe Grael https://creativecommons.org/licenses/by/4.0 |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
SciELO Preprints SciELO Preprints SciELO Preprints |
publisher.none.fl_str_mv |
SciELO Preprints SciELO Preprints SciELO Preprints |
dc.source.none.fl_str_mv |
reponame:SciELO Preprints instname:Scientific Electronic Library Online (SCIELO) instacron:SCI |
instname_str |
Scientific Electronic Library Online (SCIELO) |
instacron_str |
SCI |
institution |
SCI |
reponame_str |
SciELO Preprints |
collection |
SciELO Preprints |
repository.name.fl_str_mv |
SciELO Preprints - Scientific Electronic Library Online (SCIELO) |
repository.mail.fl_str_mv |
scielo.submission@scielo.org |
_version_ |
1797047811629383680 |