Revealing social networks\' missed behavior: detecting reactions and time-aware analyses

Detalhes bibliográficos
Autor(a) principal: Barbosa Neto, Samuel Martins
Data de Publicação: 2017
Tipo de documento: Tese
Idioma: eng
Título da fonte: Biblioteca Digital de Teses e Dissertações da USP
Texto Completo: http://www.teses.usp.br/teses/disponiveis/45/45134/tde-24082017-000227/
Resumo: Online communities provide a fertile ground for analyzing people\'s behavior and improving our understanding of social processes. For instance, when modeling social interaction online, it is important to understand when people are reacting to each other. Also, since both people and communities change over time, we argue that analyses of online communities that take time into account will lead to deeper and more accurate results. In many cases, however, users behavior can be easily missed: users react to content in many more ways than observed by explicit indicators (such as likes on Facebook or replies on Twitter) and poorly aggregated temporal data might hide, misrepresent and even lead to wrong conclusions about how users are evolving. In order to address the problem of detecting non-explicit responses, we present a new approach that uses tf-idf similarity between a user\'s own tweets and recent tweets by people they follow. Based on a month\'s worth of posting data from 449 ego networks in Twitter, this method demonstrates that it is likely that at least 11% of reactions are not captured by the explicit reply and retweet mechanisms. Further, these uncaptured reactions are not evenly distributed between users: some users, who create replies and retweets without using the official interface mechanisms, are much more responsive to followees than they appear. This suggests that detecting non-explicit responses is an important consideration in mitigating biases and building more accurate models when using these markers to study social interaction and information diffusion. We also address the problem of users evolution in Reddit based on comment and submission data from 2007 to 2014. Even using one of the simplest temporal differences between usersyearly cohortswe find wide differences in people\'s behavior, including comment activity, effort, and survival. Furthermore, not accounting for time can lead us to misinterpret important phenomena. For instance, we observe that average comment length decreases over any fixed period of time, but comment length in each cohort of users steadily increases during the same period after an abrupt initial drop, an example of Simpson\'s Paradox. Dividing cohorts into sub-cohorts based on the survival time in the community provides further insights; in particular, longer-lived users start at a higher activity level and make more and shorter comments than those who leave earlier. These findings both give more insight into user evolution in Reddit in particular, and raise a number of interesting questions around studying online behavior going forward.
id USP_5e851908f69f7c1cea596579147f0361
oai_identifier_str oai:teses.usp.br:tde-24082017-000227
network_acronym_str USP
network_name_str Biblioteca Digital de Teses e Dissertações da USP
repository_id_str 2721
spelling Revealing social networks\' missed behavior: detecting reactions and time-aware analysesRevelando o comportamento perdido em redes sociais: detectando reações e análises temporaisComportamento de usuárioParadoxo de simpsonRedditRedditRede socialSimpson's paradoxSocial networkTwitterTwitterUser behaviorOnline communities provide a fertile ground for analyzing people\'s behavior and improving our understanding of social processes. For instance, when modeling social interaction online, it is important to understand when people are reacting to each other. Also, since both people and communities change over time, we argue that analyses of online communities that take time into account will lead to deeper and more accurate results. In many cases, however, users behavior can be easily missed: users react to content in many more ways than observed by explicit indicators (such as likes on Facebook or replies on Twitter) and poorly aggregated temporal data might hide, misrepresent and even lead to wrong conclusions about how users are evolving. In order to address the problem of detecting non-explicit responses, we present a new approach that uses tf-idf similarity between a user\'s own tweets and recent tweets by people they follow. Based on a month\'s worth of posting data from 449 ego networks in Twitter, this method demonstrates that it is likely that at least 11% of reactions are not captured by the explicit reply and retweet mechanisms. Further, these uncaptured reactions are not evenly distributed between users: some users, who create replies and retweets without using the official interface mechanisms, are much more responsive to followees than they appear. This suggests that detecting non-explicit responses is an important consideration in mitigating biases and building more accurate models when using these markers to study social interaction and information diffusion. We also address the problem of users evolution in Reddit based on comment and submission data from 2007 to 2014. Even using one of the simplest temporal differences between usersyearly cohortswe find wide differences in people\'s behavior, including comment activity, effort, and survival. Furthermore, not accounting for time can lead us to misinterpret important phenomena. For instance, we observe that average comment length decreases over any fixed period of time, but comment length in each cohort of users steadily increases during the same period after an abrupt initial drop, an example of Simpson\'s Paradox. Dividing cohorts into sub-cohorts based on the survival time in the community provides further insights; in particular, longer-lived users start at a higher activity level and make more and shorter comments than those who leave earlier. These findings both give more insight into user evolution in Reddit in particular, and raise a number of interesting questions around studying online behavior going forward.Comunidades online proporcionam um ambiente fértil para análise do comportamento de indivíduos e processos sociais. Por exemplo, ao modelarmos interações sociais online, é importante compreendemos quando indivíduos estão reagindo a outros indivíduos. Além disso, pessoas e comunidades mudam com o passar do tempo, e levar em consideração sua evolução temporal nos leva a resultados mais precisos. Entretanto, em muitos casos, o comportamento dos usuários pode ser perdido: suas reações ao conteúdo ao qual são expostos não são capturadas por indicadores explícitos (likes no Facebook, replies no Twitter). Agregações temporais de dados pouco criteriosas podem ocultar, enviesar ou até levar a conclusões equivocadas sobre como usuários evoluem. Apresentamos uma nova abordagem para o problema de detectar respostas não-explicitas que utiliza similaridade tf-idf entre tweets de um usuário e tweets recentes que este usuário recebeu de quem segue. Com base em dados de postagens de um mês para 449 redes egocêntricas do Twitter, este método evidencia que temos um volume de ao menos 11% de reações não capturadas pelos mecanismos explicitos de reply e retweet. Além disso, essas reações não capturadas não estão uniformemente distribuídas entre os usuários: alguns usuários que criam replies e retweets sem utilizar os mecanismos formais da interface são muito mais responsivos a quem eles seguem do que aparentam. Isso sugere que detectar respostas não-explicitas é importante para mitigar viéses e construir modelos mais precisos a fim de estudar interações sociais e difusão de informação. Abordamos o problema de evolução de usuários no Reddit com base em dados entre o período de 2007 a 2014. Utilizando métodos simples de diferenciação temporal dos usuários -- cohorts anuais -- encontramos amplas diferenças entre o comportamento, que incluem criação de comentários, métricas de esforço e sobrevivência. Desconsiderar a evolução temporal pode levar a equívocos a respeito de fenômenos importantes. Por exemplo, o tamanho médio dos comentários na rede decresce ao longo de qualquer intervalo de tempo, mas este tamanho é crescente em cada uma das cohorts de usuários no mesmo período, salvo de uma queda inicial. Esta é uma observação do Paradoxo de Simpson. Dividir as cohorts de usuários em sub-cohorts baseadas em anos de sobrevivência na rede nos fornece uma perspectiva melhor; usuários que sobrevivem por mais tempo apresentam um maior nível de atividade inicial, com comentários mais curtos do que aqueles que sobrevivem menos. Com isto, compreendemos melhor como usuários evoluem no Reddit e levantamos uma série de questões a respeito de futuros desdobramentos do estudo de comportamento online.Biblioteca Digitais de Teses e Dissertações da USPCesar Junior, Roberto MarcondesPinhanez, Claudio SantosBarbosa Neto, Samuel Martins2017-05-29info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfhttp://www.teses.usp.br/teses/disponiveis/45/45134/tde-24082017-000227/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2018-07-17T16:38:18Zoai:teses.usp.br:tde-24082017-000227Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212018-07-17T16:38:18Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv Revealing social networks\' missed behavior: detecting reactions and time-aware analyses
Revelando o comportamento perdido em redes sociais: detectando reações e análises temporais
title Revealing social networks\' missed behavior: detecting reactions and time-aware analyses
spellingShingle Revealing social networks\' missed behavior: detecting reactions and time-aware analyses
Barbosa Neto, Samuel Martins
Comportamento de usuário
Paradoxo de simpson
Reddit
Reddit
Rede social
Simpson's paradox
Social network
Twitter
Twitter
User behavior
title_short Revealing social networks\' missed behavior: detecting reactions and time-aware analyses
title_full Revealing social networks\' missed behavior: detecting reactions and time-aware analyses
title_fullStr Revealing social networks\' missed behavior: detecting reactions and time-aware analyses
title_full_unstemmed Revealing social networks\' missed behavior: detecting reactions and time-aware analyses
title_sort Revealing social networks\' missed behavior: detecting reactions and time-aware analyses
author Barbosa Neto, Samuel Martins
author_facet Barbosa Neto, Samuel Martins
author_role author
dc.contributor.none.fl_str_mv Cesar Junior, Roberto Marcondes
Pinhanez, Claudio Santos
dc.contributor.author.fl_str_mv Barbosa Neto, Samuel Martins
dc.subject.por.fl_str_mv Comportamento de usuário
Paradoxo de simpson
Reddit
Reddit
Rede social
Simpson's paradox
Social network
Twitter
Twitter
User behavior
topic Comportamento de usuário
Paradoxo de simpson
Reddit
Reddit
Rede social
Simpson's paradox
Social network
Twitter
Twitter
User behavior
description Online communities provide a fertile ground for analyzing people\'s behavior and improving our understanding of social processes. For instance, when modeling social interaction online, it is important to understand when people are reacting to each other. Also, since both people and communities change over time, we argue that analyses of online communities that take time into account will lead to deeper and more accurate results. In many cases, however, users behavior can be easily missed: users react to content in many more ways than observed by explicit indicators (such as likes on Facebook or replies on Twitter) and poorly aggregated temporal data might hide, misrepresent and even lead to wrong conclusions about how users are evolving. In order to address the problem of detecting non-explicit responses, we present a new approach that uses tf-idf similarity between a user\'s own tweets and recent tweets by people they follow. Based on a month\'s worth of posting data from 449 ego networks in Twitter, this method demonstrates that it is likely that at least 11% of reactions are not captured by the explicit reply and retweet mechanisms. Further, these uncaptured reactions are not evenly distributed between users: some users, who create replies and retweets without using the official interface mechanisms, are much more responsive to followees than they appear. This suggests that detecting non-explicit responses is an important consideration in mitigating biases and building more accurate models when using these markers to study social interaction and information diffusion. We also address the problem of users evolution in Reddit based on comment and submission data from 2007 to 2014. Even using one of the simplest temporal differences between usersyearly cohortswe find wide differences in people\'s behavior, including comment activity, effort, and survival. Furthermore, not accounting for time can lead us to misinterpret important phenomena. For instance, we observe that average comment length decreases over any fixed period of time, but comment length in each cohort of users steadily increases during the same period after an abrupt initial drop, an example of Simpson\'s Paradox. Dividing cohorts into sub-cohorts based on the survival time in the community provides further insights; in particular, longer-lived users start at a higher activity level and make more and shorter comments than those who leave earlier. These findings both give more insight into user evolution in Reddit in particular, and raise a number of interesting questions around studying online behavior going forward.
publishDate 2017
dc.date.none.fl_str_mv 2017-05-29
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://www.teses.usp.br/teses/disponiveis/45/45134/tde-24082017-000227/
url http://www.teses.usp.br/teses/disponiveis/45/45134/tde-24082017-000227/
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv
dc.rights.driver.fl_str_mv Liberar o conteúdo para acesso público.
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Liberar o conteúdo para acesso público.
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.coverage.none.fl_str_mv
dc.publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
dc.source.none.fl_str_mv
reponame:Biblioteca Digital de Teses e Dissertações da USP
instname:Universidade de São Paulo (USP)
instacron:USP
instname_str Universidade de São Paulo (USP)
instacron_str USP
institution USP
reponame_str Biblioteca Digital de Teses e Dissertações da USP
collection Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br
_version_ 1809091047067746304