Platform for the discovery of newsworthy events in Twitter

Detalhes bibliográficos
Autor(a) principal: Duarte, Fernando José Fradique
Data de Publicação: 2017
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10773/24806
Resumo: The new communication paradigm established by Social Media, along with its growing popularity in recent years, have contributed to attract an increasing interest by several research fields. One such research field is the field of event detection in Social Media, whose relevance stems from its potential applicability in many diverse applications. One such application is the detection of newsworthy events in Social Media. The purpose of this work is therefore to implement a system to detect newsworthy events in Twitter. A similar system proposed in the literature is used as the base of this implementation. For this purpose a segmentation algorithm was implemented using a dynamic programming approach in order to split the tweets into segments. A weighting scheme that takes into account the burstiness, user support and newsworthiness of the segments was then used to rank these segments. Wikipedia was leveraged in order to derive this newsworthiness. The top K segments in this ranking were further processed and clustered into candidate events according to their similarity. These candidate events were then filtered by an SVM model trained on manually annotated data in order to retain only those related to real-world newsworthy events. The support infrastructure required by the system, namely regarding the precomputed values considered necessary to its operation was also implemented. The implemented system was tested with three months of data, representing a total of 4,770,636 tweets created in Portugal and mostly written in the Portuguese language. The precision obtained by the system was 76.9 % with a recall of 41.6%.
id RCAP_800d4ff60820e9ddf86f564e3e1e4a4b
oai_identifier_str oai:ria.ua.pt:10773/24806
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Platform for the discovery of newsworthy events in TwitterSocial MediaTwitterEvent DetectionMachine LearningDynamic ProgrammingThe new communication paradigm established by Social Media, along with its growing popularity in recent years, have contributed to attract an increasing interest by several research fields. One such research field is the field of event detection in Social Media, whose relevance stems from its potential applicability in many diverse applications. One such application is the detection of newsworthy events in Social Media. The purpose of this work is therefore to implement a system to detect newsworthy events in Twitter. A similar system proposed in the literature is used as the base of this implementation. For this purpose a segmentation algorithm was implemented using a dynamic programming approach in order to split the tweets into segments. A weighting scheme that takes into account the burstiness, user support and newsworthiness of the segments was then used to rank these segments. Wikipedia was leveraged in order to derive this newsworthiness. The top K segments in this ranking were further processed and clustered into candidate events according to their similarity. These candidate events were then filtered by an SVM model trained on manually annotated data in order to retain only those related to real-world newsworthy events. The support infrastructure required by the system, namely regarding the precomputed values considered necessary to its operation was also implemented. The implemented system was tested with three months of data, representing a total of 4,770,636 tweets created in Portugal and mostly written in the Portuguese language. The precision obtained by the system was 76.9 % with a recall of 41.6%.O novo paradigma de comunicação estabelecido pelas Redes Sociais, aliado à sua crescente popularidade no passado recente, contribuíram para suscitar o interesse de diversas áreas de investigação. Uma dessas áreas é a detecção de eventos em Redes Sociais, cuja relevância deriva do seu elevado potencial de aplicabilidade num conjunto diverso de aplicações. Uma dessas aplicações é a deteção de eventos de interesse noticioso em redes Sociais. O objectivo deste trabalho é por isso o de implementar um sistema para deteção de eventos de interesse noticioso no Twitter. Um sistema semelhante proposto na literatura é usado como base desta implementação. Para atingir este propósito foi implementado um algoritmo de segmentação utilizando uma abordagem baseada em programação dinâmica por forma a separar os tweets em segmentos. Um esquema de ponderação tendo em conta o aumento intermitente da frequência dos segmentos, a sua base de suporte em termos de utilizadores e o seu potencial noticioso foi então utilizado para gerar um ranking destes segmentos. A Wikipédia foi utilizada como meio para calcular este potencial noticioso. Os top K segmentos neste ranking foram sujeitos a processamento posterior e agrupados em eventos candidatos de acordo com a sua similaridade. Por sua vez estes eventos candidatos foram filtrados por um modelo SVM, treinado em dados anotados manualmente, por forma a reter apenas aqueles relacionados com eventos do mundo real com interesse noticioso. Foi também implementada toda a infra-estrutura de suporte necessária ao sistema, nomeadamente no que diz respeito aos valores pré-calculados considerados necessários ao seu funcionamento. O sistema implementado foi testado com três meses de dados representando um total de 4,770,636 de tweets criados em Portugal e maioritariamente escritos em português. A precisão obtida pelo sistema foi de 76.9 % e a sua sensibilidade de 41.6%.2018-12-05T15:38:36Z2017-01-01T00:00:00Z2017info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10773/24806TID:201937344engDuarte, Fernando José Fradiqueinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-02-22T11:48:28Zoai:ria.ua.pt:10773/24806Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T02:58:20.725641Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Platform for the discovery of newsworthy events in Twitter
title Platform for the discovery of newsworthy events in Twitter
spellingShingle Platform for the discovery of newsworthy events in Twitter
Duarte, Fernando José Fradique
Social Media
Twitter
Event Detection
Machine Learning
Dynamic Programming
title_short Platform for the discovery of newsworthy events in Twitter
title_full Platform for the discovery of newsworthy events in Twitter
title_fullStr Platform for the discovery of newsworthy events in Twitter
title_full_unstemmed Platform for the discovery of newsworthy events in Twitter
title_sort Platform for the discovery of newsworthy events in Twitter
author Duarte, Fernando José Fradique
author_facet Duarte, Fernando José Fradique
author_role author
dc.contributor.author.fl_str_mv Duarte, Fernando José Fradique
dc.subject.por.fl_str_mv Social Media
Twitter
Event Detection
Machine Learning
Dynamic Programming
topic Social Media
Twitter
Event Detection
Machine Learning
Dynamic Programming
description The new communication paradigm established by Social Media, along with its growing popularity in recent years, have contributed to attract an increasing interest by several research fields. One such research field is the field of event detection in Social Media, whose relevance stems from its potential applicability in many diverse applications. One such application is the detection of newsworthy events in Social Media. The purpose of this work is therefore to implement a system to detect newsworthy events in Twitter. A similar system proposed in the literature is used as the base of this implementation. For this purpose a segmentation algorithm was implemented using a dynamic programming approach in order to split the tweets into segments. A weighting scheme that takes into account the burstiness, user support and newsworthiness of the segments was then used to rank these segments. Wikipedia was leveraged in order to derive this newsworthiness. The top K segments in this ranking were further processed and clustered into candidate events according to their similarity. These candidate events were then filtered by an SVM model trained on manually annotated data in order to retain only those related to real-world newsworthy events. The support infrastructure required by the system, namely regarding the precomputed values considered necessary to its operation was also implemented. The implemented system was tested with three months of data, representing a total of 4,770,636 tweets created in Portugal and mostly written in the Portuguese language. The precision obtained by the system was 76.9 % with a recall of 41.6%.
publishDate 2017
dc.date.none.fl_str_mv 2017-01-01T00:00:00Z
2017
2018-12-05T15:38:36Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10773/24806
TID:201937344
url http://hdl.handle.net/10773/24806
identifier_str_mv TID:201937344
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799137638031032320