Urban Transport Evaluation Using Knowledge Extracted from Social Media

Francisco André Barreiros Murçós

Urban Transport Evaluation Using Knowledge Extracted from Social Media

Detalhes bibliográficos
Autor(a) principal:	Francisco André Barreiros Murçós
Data de Publicação:	2021
Tipo de documento:	Dissertação
Idioma:	eng
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	https://hdl.handle.net/10216/137319
Resumo:	Public opinion is nowadays a valuable data source for many sectors. Regarding the transportation and mobility sector, it is possible to collect information on real-time with reduced costs compared to other methods of information extraction. In this dissertation, we defined a methodology to extract knowledge from messages collected from Twitter to analyse urban mobility. The methodology was structured according three main modules: system configuration, data analytics and visualization. The messages used for the demonstration of the proposed methodology were extracted during two months for three different cities: New York, London and Melbourne. The text extraction from social media and its analysis are very time-consuming tasks due to the volume of the messages produced. Each message extracted from Twitter is, normally, short, informal and with a lot of slang or misspellings. To deal with that matter, by using NLTK (Natural Language Toolkit) tool, NLP (Natural Language Processing) techniques were applied so the text could be cleared and understandable by the algorithm. For the classification of travel related messages, a BERT (Bidirectional Transformers for Language Understanding) embedding model was used. The model is pre-trained, unsupervised and was released in 2018. In order to understand if a simple model could have good performance, an unigram approach was used. Three lists of travel-related words were used: (i) a small list with 10 traveled-related words, (ii) a medium list with 35 traveled-related words and (iii) a big list with 344 traveled-related words. The results show a high model performance with precision and accuracy higher than 0.80 and 0.90, respectively. Popular words are train, walk, street, car, station, street and avenue. Consistent results were obtained for all the three cities assessed. To evaluate the public opinion, the messages related to transportation and mobility were classified according to its sentiment. Then, to evaluate the polarity of the messages (positive, neutral or negative), VADER (Valence Aware Dictionary and sEntiment Reasone) sentiment tool was used. VADER is an easy tool to use and has great compatibility with social media messages and informal texts. It is a lexicon and rule based tool that calculates the compound value of text emotion according to its words. The developed methodology attained good performance results for the sentiment analysis where the average value of precision scored 0.77 while recall, accuracy and F1-score attained around 0.78. A specific analysis was made regarding a car crash event on New York on May 18, 2017. This analysis demonstrates that the methodology is capable of recognizing spacial changes and mobility flows directing to the potential causes of its origin. The developed work allows the conclusion that the proposed methodology can be very helpful to transport engineers, urban planners, researchers and policymakers in getting insight into public opinions regarding urban mobility.

Metadados do item

id	RCAP_58ff403424cd479e820ed4e0ada04bbe
oai_identifier_str	oai:repositorio-aberto.up.pt:10216/137319
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	Urban Transport Evaluation Using Knowledge Extracted from Social MediaEngenharia electrotécnica, electrónica e informáticaElectrical engineering, Electronic engineering, Information engineeringPublic opinion is nowadays a valuable data source for many sectors. Regarding the transportation and mobility sector, it is possible to collect information on real-time with reduced costs compared to other methods of information extraction. In this dissertation, we defined a methodology to extract knowledge from messages collected from Twitter to analyse urban mobility. The methodology was structured according three main modules: system configuration, data analytics and visualization. The messages used for the demonstration of the proposed methodology were extracted during two months for three different cities: New York, London and Melbourne. The text extraction from social media and its analysis are very time-consuming tasks due to the volume of the messages produced. Each message extracted from Twitter is, normally, short, informal and with a lot of slang or misspellings. To deal with that matter, by using NLTK (Natural Language Toolkit) tool, NLP (Natural Language Processing) techniques were applied so the text could be cleared and understandable by the algorithm. For the classification of travel related messages, a BERT (Bidirectional Transformers for Language Understanding) embedding model was used. The model is pre-trained, unsupervised and was released in 2018. In order to understand if a simple model could have good performance, an unigram approach was used. Three lists of travel-related words were used: (i) a small list with 10 traveled-related words, (ii) a medium list with 35 traveled-related words and (iii) a big list with 344 traveled-related words. The results show a high model performance with precision and accuracy higher than 0.80 and 0.90, respectively. Popular words are train, walk, street, car, station, street and avenue. Consistent results were obtained for all the three cities assessed. To evaluate the public opinion, the messages related to transportation and mobility were classified according to its sentiment. Then, to evaluate the polarity of the messages (positive, neutral or negative), VADER (Valence Aware Dictionary and sEntiment Reasone) sentiment tool was used. VADER is an easy tool to use and has great compatibility with social media messages and informal texts. It is a lexicon and rule based tool that calculates the compound value of text emotion according to its words. The developed methodology attained good performance results for the sentiment analysis where the average value of precision scored 0.77 while recall, accuracy and F1-score attained around 0.78. A specific analysis was made regarding a car crash event on New York on May 18, 2017. This analysis demonstrates that the methodology is capable of recognizing spacial changes and mobility flows directing to the potential causes of its origin. The developed work allows the conclusion that the proposed methodology can be very helpful to transport engineers, urban planners, researchers and policymakers in getting insight into public opinions regarding urban mobility.2021-10-132021-10-13T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://hdl.handle.net/10216/137319TID:202819884engFrancisco André Barreiros Murçósinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-29T14:43:52Zoai:repositorio-aberto.up.pt:10216/137319Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T00:07:25.110732Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	Urban Transport Evaluation Using Knowledge Extracted from Social Media
title	Urban Transport Evaluation Using Knowledge Extracted from Social Media
spellingShingle	Urban Transport Evaluation Using Knowledge Extracted from Social Media Francisco André Barreiros Murçós Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering
title_short	Urban Transport Evaluation Using Knowledge Extracted from Social Media
title_full	Urban Transport Evaluation Using Knowledge Extracted from Social Media
title_fullStr	Urban Transport Evaluation Using Knowledge Extracted from Social Media
title_full_unstemmed	Urban Transport Evaluation Using Knowledge Extracted from Social Media
title_sort	Urban Transport Evaluation Using Knowledge Extracted from Social Media
author	Francisco André Barreiros Murçós
author_facet	Francisco André Barreiros Murçós
author_role	author
dc.contributor.author.fl_str_mv	Francisco André Barreiros Murçós
dc.subject.por.fl_str_mv	Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering
topic	Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering
description	Public opinion is nowadays a valuable data source for many sectors. Regarding the transportation and mobility sector, it is possible to collect information on real-time with reduced costs compared to other methods of information extraction. In this dissertation, we defined a methodology to extract knowledge from messages collected from Twitter to analyse urban mobility. The methodology was structured according three main modules: system configuration, data analytics and visualization. The messages used for the demonstration of the proposed methodology were extracted during two months for three different cities: New York, London and Melbourne. The text extraction from social media and its analysis are very time-consuming tasks due to the volume of the messages produced. Each message extracted from Twitter is, normally, short, informal and with a lot of slang or misspellings. To deal with that matter, by using NLTK (Natural Language Toolkit) tool, NLP (Natural Language Processing) techniques were applied so the text could be cleared and understandable by the algorithm. For the classification of travel related messages, a BERT (Bidirectional Transformers for Language Understanding) embedding model was used. The model is pre-trained, unsupervised and was released in 2018. In order to understand if a simple model could have good performance, an unigram approach was used. Three lists of travel-related words were used: (i) a small list with 10 traveled-related words, (ii) a medium list with 35 traveled-related words and (iii) a big list with 344 traveled-related words. The results show a high model performance with precision and accuracy higher than 0.80 and 0.90, respectively. Popular words are train, walk, street, car, station, street and avenue. Consistent results were obtained for all the three cities assessed. To evaluate the public opinion, the messages related to transportation and mobility were classified according to its sentiment. Then, to evaluate the polarity of the messages (positive, neutral or negative), VADER (Valence Aware Dictionary and sEntiment Reasone) sentiment tool was used. VADER is an easy tool to use and has great compatibility with social media messages and informal texts. It is a lexicon and rule based tool that calculates the compound value of text emotion according to its words. The developed methodology attained good performance results for the sentiment analysis where the average value of precision scored 0.77 while recall, accuracy and F1-score attained around 0.78. A specific analysis was made regarding a car crash event on New York on May 18, 2017. This analysis demonstrates that the methodology is capable of recognizing spacial changes and mobility flows directing to the potential causes of its origin. The developed work allows the conclusion that the proposed methodology can be very helpful to transport engineers, urban planners, researchers and policymakers in getting insight into public opinions regarding urban mobility.
publishDate	2021
dc.date.none.fl_str_mv	2021-10-13 2021-10-13T00:00:00Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://hdl.handle.net/10216/137319 TID:202819884
url	https://hdl.handle.net/10216/137319
identifier_str_mv	TID:202819884
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799135999391956992

Urban Transport Evaluation Using Knowledge Extracted from Social Media

Registros relacionados