Twitter alloy steel disambiguation and user relevance via one-class and two-class news titles classifiers

Zola, Paola; Cortez, Paulo; Brentari, Eugenio

Twitter alloy steel disambiguation and user relevance via one-class and two-class news titles classifiers

Detalhes bibliográficos
Autor(a) principal:	Zola, Paola
Data de Publicação:	2021
Outros Autores:	Cortez, Paulo, Brentari, Eugenio
Tipo de documento:	Artigo
Idioma:	eng
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	https://hdl.handle.net/1822/73557
Resumo:	This paper addresses the nontrivial task of Twitter financial disam- biguation (TFD), which is relevant to filter financial domain tweets (e.g., alloy steel or coffee prices) when no unique identifiers (e.g., cashtags) are adopted. To automate TFD, we propose a transfer learning approach that uses freely labeled news titles to train diverse one-class and two-class classification methods. These include different text handling transforms, adaptations of statistical measures and modern machine learning methods, including support vector machines (SVM), deep autoencoders and multilayer perceptrons. As a case study, we analyzed the domain of alloy steel prices, collecting a recent Twitter dataset. Overall, the best results were achieved by a two-class SVM fed with TFD statistical measures and topic model features, obtaining an 80% and 71% discrimination level when tested with 11,081 and 3,000 manually labeled tweets. The best one-class performance (78% and 69% for the same test tweets) was obtained by a term frequency-inverse document frequency classifier (TF-IDFC). These models were further used to gen- erate a Financial User Relevance rank (FUR) score, aiming to filter relevant users. The SVM and TF-IDFC FUR models obtained a predictive user discrimination level of 80% and 75% when tested with a manually labeled test sample of 418 users. These results confirm the proposed joint TFD-FUR approach as a valuable tool for the selection of Twitter texts and users for financial social media analytics (e.g., sentiment analysis, detection of influential users).

Metadados do item

id	RCAP_5df5a1039659d20d938a4d2a87f695a3
oai_identifier_str	oai:repositorium.sdum.uminho.pt:1822/73557
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	Twitter alloy steel disambiguation and user relevance via one-class and two-class news titles classifiersText classificationUser relevanceMachine learningSocial Media analyticsCiências Naturais::Ciências da Computação e da InformaçãoScience & TechnologyIndústria, inovação e infraestruturasThis paper addresses the nontrivial task of Twitter financial disam- biguation (TFD), which is relevant to filter financial domain tweets (e.g., alloy steel or coffee prices) when no unique identifiers (e.g., cashtags) are adopted. To automate TFD, we propose a transfer learning approach that uses freely labeled news titles to train diverse one-class and two-class classification methods. These include different text handling transforms, adaptations of statistical measures and modern machine learning methods, including support vector machines (SVM), deep autoencoders and multilayer perceptrons. As a case study, we analyzed the domain of alloy steel prices, collecting a recent Twitter dataset. Overall, the best results were achieved by a two-class SVM fed with TFD statistical measures and topic model features, obtaining an 80% and 71% discrimination level when tested with 11,081 and 3,000 manually labeled tweets. The best one-class performance (78% and 69% for the same test tweets) was obtained by a term frequency-inverse document frequency classifier (TF-IDFC). These models were further used to gen- erate a Financial User Relevance rank (FUR) score, aiming to filter relevant users. The SVM and TF-IDFC FUR models obtained a predictive user discrimination level of 80% and 75% when tested with a manually labeled test sample of 418 users. These results confirm the proposed joint TFD-FUR approach as a valuable tool for the selection of Twitter texts and users for financial social media analytics (e.g., sentiment analysis, detection of influential users).Research carried out with the support of resources of Big and Open Data Innovation Laboratory (BODaI-Lab), University of Brescia, granted by Fondazione Cariplo and Regione Lombardia.SpringerUniversidade do MinhoZola, PaolaCortez, PauloBrentari, Eugenio2021-022021-02-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://hdl.handle.net/1822/73557eng0941-06431433-305810.1007/s00521-020-04991-8The original publication is available at: https://doi.org/10.1007/s00521-020-04991-8info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-12-23T01:32:23Zoai:repositorium.sdum.uminho.pt:1822/73557Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T19:25:27.116038Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	Twitter alloy steel disambiguation and user relevance via one-class and two-class news titles classifiers
title	Twitter alloy steel disambiguation and user relevance via one-class and two-class news titles classifiers
spellingShingle	Twitter alloy steel disambiguation and user relevance via one-class and two-class news titles classifiers Zola, Paola Text classification User relevance Machine learning Social Media analytics Ciências Naturais::Ciências da Computação e da Informação Science & Technology Indústria, inovação e infraestruturas
title_short	Twitter alloy steel disambiguation and user relevance via one-class and two-class news titles classifiers
title_full	Twitter alloy steel disambiguation and user relevance via one-class and two-class news titles classifiers
title_fullStr	Twitter alloy steel disambiguation and user relevance via one-class and two-class news titles classifiers
title_full_unstemmed	Twitter alloy steel disambiguation and user relevance via one-class and two-class news titles classifiers
title_sort	Twitter alloy steel disambiguation and user relevance via one-class and two-class news titles classifiers
author	Zola, Paola
author_facet	Zola, Paola Cortez, Paulo Brentari, Eugenio
author_role	author
author2	Cortez, Paulo Brentari, Eugenio
author2_role	author author
dc.contributor.none.fl_str_mv	Universidade do Minho
dc.contributor.author.fl_str_mv	Zola, Paola Cortez, Paulo Brentari, Eugenio
dc.subject.por.fl_str_mv	Text classification User relevance Machine learning Social Media analytics Ciências Naturais::Ciências da Computação e da Informação Science & Technology Indústria, inovação e infraestruturas
topic	Text classification User relevance Machine learning Social Media analytics Ciências Naturais::Ciências da Computação e da Informação Science & Technology Indústria, inovação e infraestruturas
description	This paper addresses the nontrivial task of Twitter financial disam- biguation (TFD), which is relevant to filter financial domain tweets (e.g., alloy steel or coffee prices) when no unique identifiers (e.g., cashtags) are adopted. To automate TFD, we propose a transfer learning approach that uses freely labeled news titles to train diverse one-class and two-class classification methods. These include different text handling transforms, adaptations of statistical measures and modern machine learning methods, including support vector machines (SVM), deep autoencoders and multilayer perceptrons. As a case study, we analyzed the domain of alloy steel prices, collecting a recent Twitter dataset. Overall, the best results were achieved by a two-class SVM fed with TFD statistical measures and topic model features, obtaining an 80% and 71% discrimination level when tested with 11,081 and 3,000 manually labeled tweets. The best one-class performance (78% and 69% for the same test tweets) was obtained by a term frequency-inverse document frequency classifier (TF-IDFC). These models were further used to gen- erate a Financial User Relevance rank (FUR) score, aiming to filter relevant users. The SVM and TF-IDFC FUR models obtained a predictive user discrimination level of 80% and 75% when tested with a manually labeled test sample of 418 users. These results confirm the proposed joint TFD-FUR approach as a valuable tool for the selection of Twitter texts and users for financial social media analytics (e.g., sentiment analysis, detection of influential users).
publishDate	2021
dc.date.none.fl_str_mv	2021-02 2021-02-01T00:00:00Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://hdl.handle.net/1822/73557
url	https://hdl.handle.net/1822/73557
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	0941-0643 1433-3058 10.1007/s00521-020-04991-8 The original publication is available at: https://doi.org/10.1007/s00521-020-04991-8
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Springer
publisher.none.fl_str_mv	Springer
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799132738690744320

Twitter alloy steel disambiguation and user relevance via one-class and two-class news titles classifiers

Registros relacionados