Platform for the discovery of newsworthy events in Twitter
Autor(a) principal: | |
---|---|
Data de Publicação: | 2017 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10773/24806 |
Resumo: | The new communication paradigm established by Social Media, along with its growing popularity in recent years, have contributed to attract an increasing interest by several research fields. One such research field is the field of event detection in Social Media, whose relevance stems from its potential applicability in many diverse applications. One such application is the detection of newsworthy events in Social Media. The purpose of this work is therefore to implement a system to detect newsworthy events in Twitter. A similar system proposed in the literature is used as the base of this implementation. For this purpose a segmentation algorithm was implemented using a dynamic programming approach in order to split the tweets into segments. A weighting scheme that takes into account the burstiness, user support and newsworthiness of the segments was then used to rank these segments. Wikipedia was leveraged in order to derive this newsworthiness. The top K segments in this ranking were further processed and clustered into candidate events according to their similarity. These candidate events were then filtered by an SVM model trained on manually annotated data in order to retain only those related to real-world newsworthy events. The support infrastructure required by the system, namely regarding the precomputed values considered necessary to its operation was also implemented. The implemented system was tested with three months of data, representing a total of 4,770,636 tweets created in Portugal and mostly written in the Portuguese language. The precision obtained by the system was 76.9 % with a recall of 41.6%. |
id |
RCAP_800d4ff60820e9ddf86f564e3e1e4a4b |
---|---|
oai_identifier_str |
oai:ria.ua.pt:10773/24806 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Platform for the discovery of newsworthy events in TwitterSocial MediaTwitterEvent DetectionMachine LearningDynamic ProgrammingThe new communication paradigm established by Social Media, along with its growing popularity in recent years, have contributed to attract an increasing interest by several research fields. One such research field is the field of event detection in Social Media, whose relevance stems from its potential applicability in many diverse applications. One such application is the detection of newsworthy events in Social Media. The purpose of this work is therefore to implement a system to detect newsworthy events in Twitter. A similar system proposed in the literature is used as the base of this implementation. For this purpose a segmentation algorithm was implemented using a dynamic programming approach in order to split the tweets into segments. A weighting scheme that takes into account the burstiness, user support and newsworthiness of the segments was then used to rank these segments. Wikipedia was leveraged in order to derive this newsworthiness. The top K segments in this ranking were further processed and clustered into candidate events according to their similarity. These candidate events were then filtered by an SVM model trained on manually annotated data in order to retain only those related to real-world newsworthy events. The support infrastructure required by the system, namely regarding the precomputed values considered necessary to its operation was also implemented. The implemented system was tested with three months of data, representing a total of 4,770,636 tweets created in Portugal and mostly written in the Portuguese language. The precision obtained by the system was 76.9 % with a recall of 41.6%.O novo paradigma de comunicação estabelecido pelas Redes Sociais, aliado à sua crescente popularidade no passado recente, contribuíram para suscitar o interesse de diversas áreas de investigação. Uma dessas áreas é a detecção de eventos em Redes Sociais, cuja relevância deriva do seu elevado potencial de aplicabilidade num conjunto diverso de aplicações. Uma dessas aplicações é a deteção de eventos de interesse noticioso em redes Sociais. O objectivo deste trabalho é por isso o de implementar um sistema para deteção de eventos de interesse noticioso no Twitter. Um sistema semelhante proposto na literatura é usado como base desta implementação. Para atingir este propósito foi implementado um algoritmo de segmentação utilizando uma abordagem baseada em programação dinâmica por forma a separar os tweets em segmentos. Um esquema de ponderação tendo em conta o aumento intermitente da frequência dos segmentos, a sua base de suporte em termos de utilizadores e o seu potencial noticioso foi então utilizado para gerar um ranking destes segmentos. A Wikipédia foi utilizada como meio para calcular este potencial noticioso. Os top K segmentos neste ranking foram sujeitos a processamento posterior e agrupados em eventos candidatos de acordo com a sua similaridade. Por sua vez estes eventos candidatos foram filtrados por um modelo SVM, treinado em dados anotados manualmente, por forma a reter apenas aqueles relacionados com eventos do mundo real com interesse noticioso. Foi também implementada toda a infra-estrutura de suporte necessária ao sistema, nomeadamente no que diz respeito aos valores pré-calculados considerados necessários ao seu funcionamento. O sistema implementado foi testado com três meses de dados representando um total de 4,770,636 de tweets criados em Portugal e maioritariamente escritos em português. A precisão obtida pelo sistema foi de 76.9 % e a sua sensibilidade de 41.6%.2018-12-05T15:38:36Z2017-01-01T00:00:00Z2017info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10773/24806TID:201937344engDuarte, Fernando José Fradiqueinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-05-06T04:18:12Zoai:ria.ua.pt:10773/24806Portal AgregadorONGhttps://www.rcaap.pt/oai/openairemluisa.alvim@gmail.comopendoar:71602024-05-06T04:18:12Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Platform for the discovery of newsworthy events in Twitter |
title |
Platform for the discovery of newsworthy events in Twitter |
spellingShingle |
Platform for the discovery of newsworthy events in Twitter Duarte, Fernando José Fradique Social Media Event Detection Machine Learning Dynamic Programming |
title_short |
Platform for the discovery of newsworthy events in Twitter |
title_full |
Platform for the discovery of newsworthy events in Twitter |
title_fullStr |
Platform for the discovery of newsworthy events in Twitter |
title_full_unstemmed |
Platform for the discovery of newsworthy events in Twitter |
title_sort |
Platform for the discovery of newsworthy events in Twitter |
author |
Duarte, Fernando José Fradique |
author_facet |
Duarte, Fernando José Fradique |
author_role |
author |
dc.contributor.author.fl_str_mv |
Duarte, Fernando José Fradique |
dc.subject.por.fl_str_mv |
Social Media Event Detection Machine Learning Dynamic Programming |
topic |
Social Media Event Detection Machine Learning Dynamic Programming |
description |
The new communication paradigm established by Social Media, along with its growing popularity in recent years, have contributed to attract an increasing interest by several research fields. One such research field is the field of event detection in Social Media, whose relevance stems from its potential applicability in many diverse applications. One such application is the detection of newsworthy events in Social Media. The purpose of this work is therefore to implement a system to detect newsworthy events in Twitter. A similar system proposed in the literature is used as the base of this implementation. For this purpose a segmentation algorithm was implemented using a dynamic programming approach in order to split the tweets into segments. A weighting scheme that takes into account the burstiness, user support and newsworthiness of the segments was then used to rank these segments. Wikipedia was leveraged in order to derive this newsworthiness. The top K segments in this ranking were further processed and clustered into candidate events according to their similarity. These candidate events were then filtered by an SVM model trained on manually annotated data in order to retain only those related to real-world newsworthy events. The support infrastructure required by the system, namely regarding the precomputed values considered necessary to its operation was also implemented. The implemented system was tested with three months of data, representing a total of 4,770,636 tweets created in Portugal and mostly written in the Portuguese language. The precision obtained by the system was 76.9 % with a recall of 41.6%. |
publishDate |
2017 |
dc.date.none.fl_str_mv |
2017-01-01T00:00:00Z 2017 2018-12-05T15:38:36Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10773/24806 TID:201937344 |
url |
http://hdl.handle.net/10773/24806 |
identifier_str_mv |
TID:201937344 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
mluisa.alvim@gmail.com |
_version_ |
1817543695551954944 |