Twitter Observatory: developing tools to recover and classify information for the social network Twitter

Elias, Constança Machado Aires Lobo

Twitter Observatory: developing tools to recover and classify information for the social network Twitter

Detalhes bibliográficos
Autor(a) principal:	Elias, Constança Machado Aires Lobo
Data de Publicação:	2022
Tipo de documento:	Dissertação
Idioma:	eng
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	https://hdl.handle.net/1822/84069
Resumo:	Dissertação de mestrado em Informatics Engineering

Metadados do item

id	RCAP_ccfaf3bec6e9bba7100059148193589e
oai_identifier_str	oai:repositorium.sdum.uminho.pt:1822/84069
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	Twitter Observatory: developing tools to recover and classify information for the social network TwitterTwitter Observatory: desenvolvimento de ferramentas para recolha e classificação de informação da rede social TwitterTwitterClassificação de documentosDeep LearningLíngua portuguesaDocument classificationPortuguese languageDissertação de mestrado em Informatics EngineeringAs redes sociais tornaram-se na nova forma de comunicar e, consequentemente, uma importante fonte de informação. Mais concretamente, o Twitter, desde a sua criação, tornou-se numa das redes sociais mais utilizadas. Esta popularidade permitiu um aumento do número de investigações na área de Text Mining usando o Twitter para diferentes aplicações, como saúde e política. Nesta área, a classificação de documentos tem sido aplicada a vários dados, nomeadamente tweets, para analisar tendências, entender o comportamento humano e prever determinados eventos. No entanto, nem sempre é possível ter os datasets desejados para efectuar essa classificação e análise. Para resolver o problema encontrado, esta dissertação, proposta pela OmniumAI, pretende explorar as abordagens já existentes para a extração e classificação de dados do Twitter, focando-se principalmente na língua portuguesa. Para isso, foi desenvolvida uma API capaz de extrair tweets de acordo com um determinado tópico de interesse, e criar datasets classificados automaticamente com labels de relevância. Foi ainda desenvolvida uma pipeline de classificação de tweets com base nas abordagens de Deep Learning encontradas no Estado de Arte para a classificação de documentos. O produto final consiste numa framework, Twitter Observatory, que permite aos utilizadores criar datasets de acordo com um determinado tópico de interesse e analisar esses mesmos datasets. Para testar a framework desenvolvida, foram selecionados dois casos de estudo: COVID-19 e a Invasão Russa da Ucrânia em 2022. Relativamente a estes dois tópicos, dois datasets foram extraídos e classificados de acordo com a relevância dos tweets, contendo, respetivamente, 2,268,575 e 219,887 tweets em português. Foi feita uma análise exploratória destes dados e os resultados de classificação usando modelos de Deep Learning foram apresentados. Para validar esses resultados, foi utilizado o dataset existente CrisisLex, traduzido para português.Social media have become the new form of communication and, therefore, an important source of information. More specifically, Twitter, since its foundation, became one of the most used social media platforms. Its popularity enabled the creation of an enormous amount of content, and a lot of research has been done using Twitter in different areas, such as health and politics. In the text mining field, document classification has been applied to Twitter to analyse trends, human behaviour or predict some events. However, it is not always possible to have the desired datasets to perform the classification and analysis. To solve the problem described, this dissertation, proposed by OmniumAI, aims to explore existing approaches to extract and classify Twitter data, in particular regarding the Portuguese Language. For that, it was developed an API capable of extracting tweets according to a given topic of interest, and creating datasets automatically classified with relevance labels. A classification pipeline of tweets was also devel oped based on the Deep Learning approaches found in the State of the Art for document classification. The final product consists of a framework, Twitter Observatory, that allows users to create datasets according to a particular topic of interest and analyse those datasets. To test the developed framework, two case studies were selected: COVID-19 and the Russian Invasion of Ukraine in 2022. Regarding these two topics, two datasets were extracted and automatically labelled according to the relevance of the tweets, containing, respectively, 2,268,575 and 219,887 tweets in Portuguese. An exploratory analysis of this data was performed and the classification results using Deep Learning models were presented. To validate those results, it was used an existing dataset, the CrisisLex dataset, translated into Portuguese.Rocha, MiguelPereira, VítorUniversidade do MinhoElias, Constança Machado Aires Lobo2022-12-192022-12-19T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://hdl.handle.net/1822/84069eng203252306info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-05-11T04:34:20Zoai:repositorium.sdum.uminho.pt:1822/84069Portal AgregadorONGhttps://www.rcaap.pt/oai/openairemluisa.alvim@gmail.comopendoar:71602024-05-11T04:34:20Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	Twitter Observatory: developing tools to recover and classify information for the social network Twitter Twitter Observatory: desenvolvimento de ferramentas para recolha e classificação de informação da rede social Twitter
title	Twitter Observatory: developing tools to recover and classify information for the social network Twitter
spellingShingle	Twitter Observatory: developing tools to recover and classify information for the social network Twitter Elias, Constança Machado Aires Lobo Twitter Classificação de documentos Deep Learning Língua portuguesa Document classification Portuguese language
title_short	Twitter Observatory: developing tools to recover and classify information for the social network Twitter
title_full	Twitter Observatory: developing tools to recover and classify information for the social network Twitter
title_fullStr	Twitter Observatory: developing tools to recover and classify information for the social network Twitter
title_full_unstemmed	Twitter Observatory: developing tools to recover and classify information for the social network Twitter
title_sort	Twitter Observatory: developing tools to recover and classify information for the social network Twitter
author	Elias, Constança Machado Aires Lobo
author_facet	Elias, Constança Machado Aires Lobo
author_role	author
dc.contributor.none.fl_str_mv	Rocha, Miguel Pereira, Vítor Universidade do Minho
dc.contributor.author.fl_str_mv	Elias, Constança Machado Aires Lobo
dc.subject.por.fl_str_mv	Twitter Classificação de documentos Deep Learning Língua portuguesa Document classification Portuguese language
topic	Twitter Classificação de documentos Deep Learning Língua portuguesa Document classification Portuguese language
description	Dissertação de mestrado em Informatics Engineering
publishDate	2022
dc.date.none.fl_str_mv	2022-12-19 2022-12-19T00:00:00Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://hdl.handle.net/1822/84069
url	https://hdl.handle.net/1822/84069
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	203252306
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv	mluisa.alvim@gmail.com
_version_	1817544352501596160

Twitter Observatory: developing tools to recover and classify information for the social network Twitter

Registros relacionados