Twitter user geolocation using web country noun searches
Autor(a) principal: | |
---|---|
Data de Publicação: | 2019 |
Outros Autores: | , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/1822/62749 |
Resumo: | Several Web and social media analytics require user geolocation data. Although Twitter is a powerful source for social media analytics, its user geolocation is a nontrivial task. This paper presents a purely word distribution method for Twitter user country geolocation. In particular, we focus on the frequencies of tweet nouns and their statistical matches with Google Trends world country distributions (GTN method). Several experiments were conducted, using a recently created dataset of 744,830 tweets produced by 3298 users from 54 countries and written in 48 languages. Overall, the proposed GTN approach is competitive when compared with a state-of-the-art world distribution geolocation method. To reduce the number of Google Trends queries, we also tested a machine learning variant (GTN2) that is capable of matching the GTN responses with an 80% accuracy while being much faster than GTN. |
id |
RCAP_fbb2f5e225ba0c766c457362b41930b9 |
---|---|
oai_identifier_str |
oai:repositorium.sdum.uminho.pt:1822/62749 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Twitter user geolocation using web country noun searchesCountry geolocationGoogle TrendsMachine learningNatural language processingTwitterCiências Naturais::Ciências da Computação e da InformaçãoScience & TechnologySeveral Web and social media analytics require user geolocation data. Although Twitter is a powerful source for social media analytics, its user geolocation is a nontrivial task. This paper presents a purely word distribution method for Twitter user country geolocation. In particular, we focus on the frequencies of tweet nouns and their statistical matches with Google Trends world country distributions (GTN method). Several experiments were conducted, using a recently created dataset of 744,830 tweets produced by 3298 users from 54 countries and written in 48 languages. Overall, the proposed GTN approach is competitive when compared with a state-of-the-art world distribution geolocation method. To reduce the number of Google Trends queries, we also tested a machine learning variant (GTN2) that is capable of matching the GTN responses with an 80% accuracy while being much faster than GTN.Research carried out with the support of resources of Big and Open Data Innovation Laboratory (BODaI-Lab), University of Brescia, granted by Fondazione Cariplo and Regione Lombardia. The work of P. Cortez was supported by FCT - Fundacao para a Ciencia e Tecnologia within the Project Scope UID/CEC/00319/2019. We would also like to thank the anonymous reviewers for their helpful suggestions.Elsevier Science BVUniversidade do MinhoZola, PaolaCortez, PauloCarpita, Maurizio20192019-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/1822/62749eng0167-923610.1016/j.dss.2019.03.006https://www.sciencedirect.com/science/article/pii/S0167923619300442info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-07-21T12:38:40Zoai:repositorium.sdum.uminho.pt:1822/62749Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T19:35:11.170027Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Twitter user geolocation using web country noun searches |
title |
Twitter user geolocation using web country noun searches |
spellingShingle |
Twitter user geolocation using web country noun searches Zola, Paola Country geolocation Google Trends Machine learning Natural language processing Ciências Naturais::Ciências da Computação e da Informação Science & Technology |
title_short |
Twitter user geolocation using web country noun searches |
title_full |
Twitter user geolocation using web country noun searches |
title_fullStr |
Twitter user geolocation using web country noun searches |
title_full_unstemmed |
Twitter user geolocation using web country noun searches |
title_sort |
Twitter user geolocation using web country noun searches |
author |
Zola, Paola |
author_facet |
Zola, Paola Cortez, Paulo Carpita, Maurizio |
author_role |
author |
author2 |
Cortez, Paulo Carpita, Maurizio |
author2_role |
author author |
dc.contributor.none.fl_str_mv |
Universidade do Minho |
dc.contributor.author.fl_str_mv |
Zola, Paola Cortez, Paulo Carpita, Maurizio |
dc.subject.por.fl_str_mv |
Country geolocation Google Trends Machine learning Natural language processing Ciências Naturais::Ciências da Computação e da Informação Science & Technology |
topic |
Country geolocation Google Trends Machine learning Natural language processing Ciências Naturais::Ciências da Computação e da Informação Science & Technology |
description |
Several Web and social media analytics require user geolocation data. Although Twitter is a powerful source for social media analytics, its user geolocation is a nontrivial task. This paper presents a purely word distribution method for Twitter user country geolocation. In particular, we focus on the frequencies of tweet nouns and their statistical matches with Google Trends world country distributions (GTN method). Several experiments were conducted, using a recently created dataset of 744,830 tweets produced by 3298 users from 54 countries and written in 48 languages. Overall, the proposed GTN approach is competitive when compared with a state-of-the-art world distribution geolocation method. To reduce the number of Google Trends queries, we also tested a machine learning variant (GTN2) that is capable of matching the GTN responses with an 80% accuracy while being much faster than GTN. |
publishDate |
2019 |
dc.date.none.fl_str_mv |
2019 2019-01-01T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/1822/62749 |
url |
http://hdl.handle.net/1822/62749 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
0167-9236 10.1016/j.dss.2019.03.006 https://www.sciencedirect.com/science/article/pii/S0167923619300442 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Elsevier Science BV |
publisher.none.fl_str_mv |
Elsevier Science BV |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799132876058394624 |