Twitter user geolocation using web country noun searches

Detalhes bibliográficos
Autor(a) principal: Zola, Paola
Data de Publicação: 2019
Outros Autores: Cortez, Paulo, Carpita, Maurizio
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/1822/62749
Resumo: Several Web and social media analytics require user geolocation data. Although Twitter is a powerful source for social media analytics, its user geolocation is a nontrivial task. This paper presents a purely word distribution method for Twitter user country geolocation. In particular, we focus on the frequencies of tweet nouns and their statistical matches with Google Trends world country distributions (GTN method). Several experiments were conducted, using a recently created dataset of 744,830 tweets produced by 3298 users from 54 countries and written in 48 languages. Overall, the proposed GTN approach is competitive when compared with a state-of-the-art world distribution geolocation method. To reduce the number of Google Trends queries, we also tested a machine learning variant (GTN2) that is capable of matching the GTN responses with an 80% accuracy while being much faster than GTN.
id RCAP_fbb2f5e225ba0c766c457362b41930b9
oai_identifier_str oai:repositorium.sdum.uminho.pt:1822/62749
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Twitter user geolocation using web country noun searchesCountry geolocationGoogle TrendsMachine learningNatural language processingTwitterCiências Naturais::Ciências da Computação e da InformaçãoScience & TechnologySeveral Web and social media analytics require user geolocation data. Although Twitter is a powerful source for social media analytics, its user geolocation is a nontrivial task. This paper presents a purely word distribution method for Twitter user country geolocation. In particular, we focus on the frequencies of tweet nouns and their statistical matches with Google Trends world country distributions (GTN method). Several experiments were conducted, using a recently created dataset of 744,830 tweets produced by 3298 users from 54 countries and written in 48 languages. Overall, the proposed GTN approach is competitive when compared with a state-of-the-art world distribution geolocation method. To reduce the number of Google Trends queries, we also tested a machine learning variant (GTN2) that is capable of matching the GTN responses with an 80% accuracy while being much faster than GTN.Research carried out with the support of resources of Big and Open Data Innovation Laboratory (BODaI-Lab), University of Brescia, granted by Fondazione Cariplo and Regione Lombardia. The work of P. Cortez was supported by FCT - Fundacao para a Ciencia e Tecnologia within the Project Scope UID/CEC/00319/2019. We would also like to thank the anonymous reviewers for their helpful suggestions.Elsevier Science BVUniversidade do MinhoZola, PaolaCortez, PauloCarpita, Maurizio20192019-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/1822/62749eng0167-923610.1016/j.dss.2019.03.006https://www.sciencedirect.com/science/article/pii/S0167923619300442info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-07-21T12:38:40Zoai:repositorium.sdum.uminho.pt:1822/62749Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T19:35:11.170027Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Twitter user geolocation using web country noun searches
title Twitter user geolocation using web country noun searches
spellingShingle Twitter user geolocation using web country noun searches
Zola, Paola
Country geolocation
Google Trends
Machine learning
Natural language processing
Twitter
Ciências Naturais::Ciências da Computação e da Informação
Science & Technology
title_short Twitter user geolocation using web country noun searches
title_full Twitter user geolocation using web country noun searches
title_fullStr Twitter user geolocation using web country noun searches
title_full_unstemmed Twitter user geolocation using web country noun searches
title_sort Twitter user geolocation using web country noun searches
author Zola, Paola
author_facet Zola, Paola
Cortez, Paulo
Carpita, Maurizio
author_role author
author2 Cortez, Paulo
Carpita, Maurizio
author2_role author
author
dc.contributor.none.fl_str_mv Universidade do Minho
dc.contributor.author.fl_str_mv Zola, Paola
Cortez, Paulo
Carpita, Maurizio
dc.subject.por.fl_str_mv Country geolocation
Google Trends
Machine learning
Natural language processing
Twitter
Ciências Naturais::Ciências da Computação e da Informação
Science & Technology
topic Country geolocation
Google Trends
Machine learning
Natural language processing
Twitter
Ciências Naturais::Ciências da Computação e da Informação
Science & Technology
description Several Web and social media analytics require user geolocation data. Although Twitter is a powerful source for social media analytics, its user geolocation is a nontrivial task. This paper presents a purely word distribution method for Twitter user country geolocation. In particular, we focus on the frequencies of tweet nouns and their statistical matches with Google Trends world country distributions (GTN method). Several experiments were conducted, using a recently created dataset of 744,830 tweets produced by 3298 users from 54 countries and written in 48 languages. Overall, the proposed GTN approach is competitive when compared with a state-of-the-art world distribution geolocation method. To reduce the number of Google Trends queries, we also tested a machine learning variant (GTN2) that is capable of matching the GTN responses with an 80% accuracy while being much faster than GTN.
publishDate 2019
dc.date.none.fl_str_mv 2019
2019-01-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/1822/62749
url http://hdl.handle.net/1822/62749
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 0167-9236
10.1016/j.dss.2019.03.006
https://www.sciencedirect.com/science/article/pii/S0167923619300442
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Elsevier Science BV
publisher.none.fl_str_mv Elsevier Science BV
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799132876058394624