Social Media Text Processing and Semantic Analysis for Smart Cities
Autor(a) principal: | |
---|---|
Data de Publicação: | 2017 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | https://hdl.handle.net/10216/105910 |
Resumo: | With the rise of Social Media, people obtain and share information almost instantly on a 24/7 basis. Many research areas have tried to extract valuable insights from these large volumes of freely available user generated content. The research areas of intelligent transportation systems and smart cities are no exception. However, extracting meaningful and actionable knowledge from user generated content is a complex endeavour. First, each social media service as its own data collection specificities and constraints, second the volume of messages/posts produced can be overwhelming for automatic processing and mining, and last but not the least, social media texts are usually short, informal, with a lot of abbreviations, jargon, slang and idioms. In this thesis, we try to tackle some of the aforementioned challenges with the goal of extracting knowledge from social media streams that might be useful in the context of intelligent transportation systems and smart cities. We designed and developed a framework for collection, processing and mining of geo-located Tweets. More specifically, it provides functionalities for parallel collection of geo-located tweets from multiple pre-defined bounding boxes (cities or regions), including filtering of non-complying tweets, text pre-processing for Portuguese and English language, topic modelling, and transportation-specific text classifiers, as well as, aggregation and data visualisation. We performed empirical studies and implemented illustrative examples for five cities: Rio de Janeiro, São Paulo, New York City, London and Melbourne, comprising a total of more than X millions of tweets in a period of 3 months. The topic modelling and text classifiers were evaluated with manually labelled data specifically created for this work. Both software and gold standard data will be made publicly available to foster further developments from the research community. |
id |
RCAP_242e1f45efb9ec160eb8ff7b14b4a648 |
---|---|
oai_identifier_str |
oai:repositorio-aberto.up.pt:10216/105910 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Social Media Text Processing and Semantic Analysis for Smart CitiesEngenharia electrotécnica, electrónica e informáticaElectrical engineering, Electronic engineering, Information engineeringWith the rise of Social Media, people obtain and share information almost instantly on a 24/7 basis. Many research areas have tried to extract valuable insights from these large volumes of freely available user generated content. The research areas of intelligent transportation systems and smart cities are no exception. However, extracting meaningful and actionable knowledge from user generated content is a complex endeavour. First, each social media service as its own data collection specificities and constraints, second the volume of messages/posts produced can be overwhelming for automatic processing and mining, and last but not the least, social media texts are usually short, informal, with a lot of abbreviations, jargon, slang and idioms. In this thesis, we try to tackle some of the aforementioned challenges with the goal of extracting knowledge from social media streams that might be useful in the context of intelligent transportation systems and smart cities. We designed and developed a framework for collection, processing and mining of geo-located Tweets. More specifically, it provides functionalities for parallel collection of geo-located tweets from multiple pre-defined bounding boxes (cities or regions), including filtering of non-complying tweets, text pre-processing for Portuguese and English language, topic modelling, and transportation-specific text classifiers, as well as, aggregation and data visualisation. We performed empirical studies and implemented illustrative examples for five cities: Rio de Janeiro, São Paulo, New York City, London and Melbourne, comprising a total of more than X millions of tweets in a period of 3 months. The topic modelling and text classifiers were evaluated with manually labelled data specifically created for this work. Both software and gold standard data will be made publicly available to foster further developments from the research community.2017-07-142017-07-14T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://hdl.handle.net/10216/105910TID:201798921engJoão Filipe Figueiredo Pereirainfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-29T14:01:52Zoai:repositorio-aberto.up.pt:10216/105910Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T23:52:56.241527Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Social Media Text Processing and Semantic Analysis for Smart Cities |
title |
Social Media Text Processing and Semantic Analysis for Smart Cities |
spellingShingle |
Social Media Text Processing and Semantic Analysis for Smart Cities João Filipe Figueiredo Pereira Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering |
title_short |
Social Media Text Processing and Semantic Analysis for Smart Cities |
title_full |
Social Media Text Processing and Semantic Analysis for Smart Cities |
title_fullStr |
Social Media Text Processing and Semantic Analysis for Smart Cities |
title_full_unstemmed |
Social Media Text Processing and Semantic Analysis for Smart Cities |
title_sort |
Social Media Text Processing and Semantic Analysis for Smart Cities |
author |
João Filipe Figueiredo Pereira |
author_facet |
João Filipe Figueiredo Pereira |
author_role |
author |
dc.contributor.author.fl_str_mv |
João Filipe Figueiredo Pereira |
dc.subject.por.fl_str_mv |
Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering |
topic |
Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering |
description |
With the rise of Social Media, people obtain and share information almost instantly on a 24/7 basis. Many research areas have tried to extract valuable insights from these large volumes of freely available user generated content. The research areas of intelligent transportation systems and smart cities are no exception. However, extracting meaningful and actionable knowledge from user generated content is a complex endeavour. First, each social media service as its own data collection specificities and constraints, second the volume of messages/posts produced can be overwhelming for automatic processing and mining, and last but not the least, social media texts are usually short, informal, with a lot of abbreviations, jargon, slang and idioms. In this thesis, we try to tackle some of the aforementioned challenges with the goal of extracting knowledge from social media streams that might be useful in the context of intelligent transportation systems and smart cities. We designed and developed a framework for collection, processing and mining of geo-located Tweets. More specifically, it provides functionalities for parallel collection of geo-located tweets from multiple pre-defined bounding boxes (cities or regions), including filtering of non-complying tweets, text pre-processing for Portuguese and English language, topic modelling, and transportation-specific text classifiers, as well as, aggregation and data visualisation. We performed empirical studies and implemented illustrative examples for five cities: Rio de Janeiro, São Paulo, New York City, London and Melbourne, comprising a total of more than X millions of tweets in a period of 3 months. The topic modelling and text classifiers were evaluated with manually labelled data specifically created for this work. Both software and gold standard data will be made publicly available to foster further developments from the research community. |
publishDate |
2017 |
dc.date.none.fl_str_mv |
2017-07-14 2017-07-14T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://hdl.handle.net/10216/105910 TID:201798921 |
url |
https://hdl.handle.net/10216/105910 |
identifier_str_mv |
TID:201798921 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799135849542057985 |