Twitter alloy steel disambiguation and user relevance via one-class and two-class news titles classifiers

Detalhes bibliográficos
Autor(a) principal: Zola, Paola
Data de Publicação: 2021
Outros Autores: Cortez, Paulo, Brentari, Eugenio
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: https://hdl.handle.net/1822/73557
Resumo: This paper addresses the nontrivial task of Twitter financial disam- biguation (TFD), which is relevant to filter financial domain tweets (e.g., alloy steel or coffee prices) when no unique identifiers (e.g., cashtags) are adopted. To automate TFD, we propose a transfer learning approach that uses freely labeled news titles to train diverse one-class and two-class classification methods. These include different text handling transforms, adaptations of statistical measures and modern machine learning methods, including support vector machines (SVM), deep autoencoders and multilayer perceptrons. As a case study, we analyzed the domain of alloy steel prices, collecting a recent Twitter dataset. Overall, the best results were achieved by a two-class SVM fed with TFD statistical measures and topic model features, obtaining an 80% and 71% discrimination level when tested with 11,081 and 3,000 manually labeled tweets. The best one-class performance (78% and 69% for the same test tweets) was obtained by a term frequency-inverse document frequency classifier (TF-IDFC). These models were further used to gen- erate a Financial User Relevance rank (FUR) score, aiming to filter relevant users. The SVM and TF-IDFC FUR models obtained a predictive user discrimination level of 80% and 75% when tested with a manually labeled test sample of 418 users. These results confirm the proposed joint TFD-FUR approach as a valuable tool for the selection of Twitter texts and users for financial social media analytics (e.g., sentiment analysis, detection of influential users).
id RCAP_5df5a1039659d20d938a4d2a87f695a3
oai_identifier_str oai:repositorium.sdum.uminho.pt:1822/73557
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Twitter alloy steel disambiguation and user relevance via one-class and two-class news titles classifiersText classificationUser relevanceMachine learningSocial Media analyticsCiências Naturais::Ciências da Computação e da InformaçãoScience & TechnologyIndústria, inovação e infraestruturasThis paper addresses the nontrivial task of Twitter financial disam- biguation (TFD), which is relevant to filter financial domain tweets (e.g., alloy steel or coffee prices) when no unique identifiers (e.g., cashtags) are adopted. To automate TFD, we propose a transfer learning approach that uses freely labeled news titles to train diverse one-class and two-class classification methods. These include different text handling transforms, adaptations of statistical measures and modern machine learning methods, including support vector machines (SVM), deep autoencoders and multilayer perceptrons. As a case study, we analyzed the domain of alloy steel prices, collecting a recent Twitter dataset. Overall, the best results were achieved by a two-class SVM fed with TFD statistical measures and topic model features, obtaining an 80% and 71% discrimination level when tested with 11,081 and 3,000 manually labeled tweets. The best one-class performance (78% and 69% for the same test tweets) was obtained by a term frequency-inverse document frequency classifier (TF-IDFC). These models were further used to gen- erate a Financial User Relevance rank (FUR) score, aiming to filter relevant users. The SVM and TF-IDFC FUR models obtained a predictive user discrimination level of 80% and 75% when tested with a manually labeled test sample of 418 users. These results confirm the proposed joint TFD-FUR approach as a valuable tool for the selection of Twitter texts and users for financial social media analytics (e.g., sentiment analysis, detection of influential users).Research carried out with the support of resources of Big and Open Data Innovation Laboratory (BODaI-Lab), University of Brescia, granted by Fondazione Cariplo and Regione Lombardia.SpringerUniversidade do MinhoZola, PaolaCortez, PauloBrentari, Eugenio2021-022021-02-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://hdl.handle.net/1822/73557eng0941-06431433-305810.1007/s00521-020-04991-8The original publication is available at: https://doi.org/10.1007/s00521-020-04991-8info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-12-23T01:32:23Zoai:repositorium.sdum.uminho.pt:1822/73557Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T19:25:27.116038Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Twitter alloy steel disambiguation and user relevance via one-class and two-class news titles classifiers
title Twitter alloy steel disambiguation and user relevance via one-class and two-class news titles classifiers
spellingShingle Twitter alloy steel disambiguation and user relevance via one-class and two-class news titles classifiers
Zola, Paola
Text classification
User relevance
Machine learning
Social Media analytics
Ciências Naturais::Ciências da Computação e da Informação
Science & Technology
Indústria, inovação e infraestruturas
title_short Twitter alloy steel disambiguation and user relevance via one-class and two-class news titles classifiers
title_full Twitter alloy steel disambiguation and user relevance via one-class and two-class news titles classifiers
title_fullStr Twitter alloy steel disambiguation and user relevance via one-class and two-class news titles classifiers
title_full_unstemmed Twitter alloy steel disambiguation and user relevance via one-class and two-class news titles classifiers
title_sort Twitter alloy steel disambiguation and user relevance via one-class and two-class news titles classifiers
author Zola, Paola
author_facet Zola, Paola
Cortez, Paulo
Brentari, Eugenio
author_role author
author2 Cortez, Paulo
Brentari, Eugenio
author2_role author
author
dc.contributor.none.fl_str_mv Universidade do Minho
dc.contributor.author.fl_str_mv Zola, Paola
Cortez, Paulo
Brentari, Eugenio
dc.subject.por.fl_str_mv Text classification
User relevance
Machine learning
Social Media analytics
Ciências Naturais::Ciências da Computação e da Informação
Science & Technology
Indústria, inovação e infraestruturas
topic Text classification
User relevance
Machine learning
Social Media analytics
Ciências Naturais::Ciências da Computação e da Informação
Science & Technology
Indústria, inovação e infraestruturas
description This paper addresses the nontrivial task of Twitter financial disam- biguation (TFD), which is relevant to filter financial domain tweets (e.g., alloy steel or coffee prices) when no unique identifiers (e.g., cashtags) are adopted. To automate TFD, we propose a transfer learning approach that uses freely labeled news titles to train diverse one-class and two-class classification methods. These include different text handling transforms, adaptations of statistical measures and modern machine learning methods, including support vector machines (SVM), deep autoencoders and multilayer perceptrons. As a case study, we analyzed the domain of alloy steel prices, collecting a recent Twitter dataset. Overall, the best results were achieved by a two-class SVM fed with TFD statistical measures and topic model features, obtaining an 80% and 71% discrimination level when tested with 11,081 and 3,000 manually labeled tweets. The best one-class performance (78% and 69% for the same test tweets) was obtained by a term frequency-inverse document frequency classifier (TF-IDFC). These models were further used to gen- erate a Financial User Relevance rank (FUR) score, aiming to filter relevant users. The SVM and TF-IDFC FUR models obtained a predictive user discrimination level of 80% and 75% when tested with a manually labeled test sample of 418 users. These results confirm the proposed joint TFD-FUR approach as a valuable tool for the selection of Twitter texts and users for financial social media analytics (e.g., sentiment analysis, detection of influential users).
publishDate 2021
dc.date.none.fl_str_mv 2021-02
2021-02-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://hdl.handle.net/1822/73557
url https://hdl.handle.net/1822/73557
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 0941-0643
1433-3058
10.1007/s00521-020-04991-8
The original publication is available at: https://doi.org/10.1007/s00521-020-04991-8
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Springer
publisher.none.fl_str_mv Springer
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799132738690744320