Social Media Text Processing and Semantic Analysis for Smart Cities

Detalhes bibliográficos
Autor(a) principal: João Filipe Figueiredo Pereira
Data de Publicação: 2017
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: https://hdl.handle.net/10216/105910
Resumo: With the rise of Social Media, people obtain and share information almost instantly on a 24/7 basis. Many research areas have tried to extract valuable insights from these large volumes of freely available user generated content. The research areas of intelligent transportation systems and smart cities are no exception. However, extracting meaningful and actionable knowledge from user generated content is a complex endeavour. First, each social media service as its own data collection specificities and constraints, second the volume of messages/posts produced can be overwhelming for automatic processing and mining, and last but not the least, social media texts are usually short, informal, with a lot of abbreviations, jargon, slang and idioms. In this thesis, we try to tackle some of the aforementioned challenges with the goal of extracting knowledge from social media streams that might be useful in the context of intelligent transportation systems and smart cities. We designed and developed a framework for collection, processing and mining of geo-located Tweets. More specifically, it provides functionalities for parallel collection of geo-located tweets from multiple pre-defined bounding boxes (cities or regions), including filtering of non-complying tweets, text pre-processing for Portuguese and English language, topic modelling, and transportation-specific text classifiers, as well as, aggregation and data visualisation. We performed empirical studies and implemented illustrative examples for five cities: Rio de Janeiro, São Paulo, New York City, London and Melbourne, comprising a total of more than X millions of tweets in a period of 3 months. The topic modelling and text classifiers were evaluated with manually labelled data specifically created for this work. Both software and gold standard data will be made publicly available to foster further developments from the research community.
id RCAP_242e1f45efb9ec160eb8ff7b14b4a648
oai_identifier_str oai:repositorio-aberto.up.pt:10216/105910
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Social Media Text Processing and Semantic Analysis for Smart CitiesEngenharia electrotécnica, electrónica e informáticaElectrical engineering, Electronic engineering, Information engineeringWith the rise of Social Media, people obtain and share information almost instantly on a 24/7 basis. Many research areas have tried to extract valuable insights from these large volumes of freely available user generated content. The research areas of intelligent transportation systems and smart cities are no exception. However, extracting meaningful and actionable knowledge from user generated content is a complex endeavour. First, each social media service as its own data collection specificities and constraints, second the volume of messages/posts produced can be overwhelming for automatic processing and mining, and last but not the least, social media texts are usually short, informal, with a lot of abbreviations, jargon, slang and idioms. In this thesis, we try to tackle some of the aforementioned challenges with the goal of extracting knowledge from social media streams that might be useful in the context of intelligent transportation systems and smart cities. We designed and developed a framework for collection, processing and mining of geo-located Tweets. More specifically, it provides functionalities for parallel collection of geo-located tweets from multiple pre-defined bounding boxes (cities or regions), including filtering of non-complying tweets, text pre-processing for Portuguese and English language, topic modelling, and transportation-specific text classifiers, as well as, aggregation and data visualisation. We performed empirical studies and implemented illustrative examples for five cities: Rio de Janeiro, São Paulo, New York City, London and Melbourne, comprising a total of more than X millions of tweets in a period of 3 months. The topic modelling and text classifiers were evaluated with manually labelled data specifically created for this work. Both software and gold standard data will be made publicly available to foster further developments from the research community.2017-07-142017-07-14T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://hdl.handle.net/10216/105910TID:201798921engJoão Filipe Figueiredo Pereirainfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-29T14:01:52Zoai:repositorio-aberto.up.pt:10216/105910Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T23:52:56.241527Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Social Media Text Processing and Semantic Analysis for Smart Cities
title Social Media Text Processing and Semantic Analysis for Smart Cities
spellingShingle Social Media Text Processing and Semantic Analysis for Smart Cities
João Filipe Figueiredo Pereira
Engenharia electrotécnica, electrónica e informática
Electrical engineering, Electronic engineering, Information engineering
title_short Social Media Text Processing and Semantic Analysis for Smart Cities
title_full Social Media Text Processing and Semantic Analysis for Smart Cities
title_fullStr Social Media Text Processing and Semantic Analysis for Smart Cities
title_full_unstemmed Social Media Text Processing and Semantic Analysis for Smart Cities
title_sort Social Media Text Processing and Semantic Analysis for Smart Cities
author João Filipe Figueiredo Pereira
author_facet João Filipe Figueiredo Pereira
author_role author
dc.contributor.author.fl_str_mv João Filipe Figueiredo Pereira
dc.subject.por.fl_str_mv Engenharia electrotécnica, electrónica e informática
Electrical engineering, Electronic engineering, Information engineering
topic Engenharia electrotécnica, electrónica e informática
Electrical engineering, Electronic engineering, Information engineering
description With the rise of Social Media, people obtain and share information almost instantly on a 24/7 basis. Many research areas have tried to extract valuable insights from these large volumes of freely available user generated content. The research areas of intelligent transportation systems and smart cities are no exception. However, extracting meaningful and actionable knowledge from user generated content is a complex endeavour. First, each social media service as its own data collection specificities and constraints, second the volume of messages/posts produced can be overwhelming for automatic processing and mining, and last but not the least, social media texts are usually short, informal, with a lot of abbreviations, jargon, slang and idioms. In this thesis, we try to tackle some of the aforementioned challenges with the goal of extracting knowledge from social media streams that might be useful in the context of intelligent transportation systems and smart cities. We designed and developed a framework for collection, processing and mining of geo-located Tweets. More specifically, it provides functionalities for parallel collection of geo-located tweets from multiple pre-defined bounding boxes (cities or regions), including filtering of non-complying tweets, text pre-processing for Portuguese and English language, topic modelling, and transportation-specific text classifiers, as well as, aggregation and data visualisation. We performed empirical studies and implemented illustrative examples for five cities: Rio de Janeiro, São Paulo, New York City, London and Melbourne, comprising a total of more than X millions of tweets in a period of 3 months. The topic modelling and text classifiers were evaluated with manually labelled data specifically created for this work. Both software and gold standard data will be made publicly available to foster further developments from the research community.
publishDate 2017
dc.date.none.fl_str_mv 2017-07-14
2017-07-14T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://hdl.handle.net/10216/105910
TID:201798921
url https://hdl.handle.net/10216/105910
identifier_str_mv TID:201798921
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799135849542057985