Predicting the Relevance of Social Media Posts Based on Linguistic Features and Journalistic Criteria

Detalhes bibliográficos
Autor(a) principal: Pinto,A
Data de Publicação: 2017
Outros Autores: Oliveira,HG, Álvaro Figueira, Alves,AO
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://repositorio.inesctec.pt/handle/123456789/5822
http://dx.doi.org/10.1007/s00354-017-0015-1
Resumo: An overwhelming quantity of messages is posted in social networks every minute. To make the utilization of these platforms more productive, it is imperative to filter out information that is irrelevant to the general audience, such as private messages, personal opinions or well-known facts. This work is focused on the automatic classification of public social text according to its potential relevance, from a journalistic point of view, hopefully improving the overall experience of using a social network. Our experiments were based on a set of posts with several criteria, including the journalistic relevance, assessed by human judges. To predict the latter, we rely exclusively on linguistic features, extracted by Natural Language Processing tools, regardless the author of the message and its profile information. In our first approach, different classifiers and feature engineering methods were used to predict relevance directly from the selected features. In a second approach, relevance was predicted indirectly, based on an ensemble of classifiers for other key criteria when defining relevance-controversy, interestingness, meaningfulness, novelty, reliability and scope-also in the dataset. The first approach achieved a F (1)-score of 0.76 and an Area under the ROC curve (AUC) of 0.63. But the best results were achieved by the second approach, with the best learned model achieving a F (1)-score of 0.84 with an AUC of 0.78. This confirmed that journalistic relevance can indeed be predicted by the combination of the selected criteria, and that linguistic features can be exploited to classify the latter.
id RCAP_d8aee103fe31696627cd705288082ee5
oai_identifier_str oai:repositorio.inesctec.pt:123456789/5822
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Predicting the Relevance of Social Media Posts Based on Linguistic Features and Journalistic CriteriaAn overwhelming quantity of messages is posted in social networks every minute. To make the utilization of these platforms more productive, it is imperative to filter out information that is irrelevant to the general audience, such as private messages, personal opinions or well-known facts. This work is focused on the automatic classification of public social text according to its potential relevance, from a journalistic point of view, hopefully improving the overall experience of using a social network. Our experiments were based on a set of posts with several criteria, including the journalistic relevance, assessed by human judges. To predict the latter, we rely exclusively on linguistic features, extracted by Natural Language Processing tools, regardless the author of the message and its profile information. In our first approach, different classifiers and feature engineering methods were used to predict relevance directly from the selected features. In a second approach, relevance was predicted indirectly, based on an ensemble of classifiers for other key criteria when defining relevance-controversy, interestingness, meaningfulness, novelty, reliability and scope-also in the dataset. The first approach achieved a F (1)-score of 0.76 and an Area under the ROC curve (AUC) of 0.63. But the best results were achieved by the second approach, with the best learned model achieving a F (1)-score of 0.84 with an AUC of 0.78. This confirmed that journalistic relevance can indeed be predicted by the combination of the selected criteria, and that linguistic features can be exploited to classify the latter.2018-01-10T10:19:39Z2017-01-01T00:00:00Z2017info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://repositorio.inesctec.pt/handle/123456789/5822http://dx.doi.org/10.1007/s00354-017-0015-1engPinto,AOliveira,HGÁlvaro FigueiraAlves,AOinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-05-15T10:20:12Zoai:repositorio.inesctec.pt:123456789/5822Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T17:52:48.781392Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Predicting the Relevance of Social Media Posts Based on Linguistic Features and Journalistic Criteria
title Predicting the Relevance of Social Media Posts Based on Linguistic Features and Journalistic Criteria
spellingShingle Predicting the Relevance of Social Media Posts Based on Linguistic Features and Journalistic Criteria
Pinto,A
title_short Predicting the Relevance of Social Media Posts Based on Linguistic Features and Journalistic Criteria
title_full Predicting the Relevance of Social Media Posts Based on Linguistic Features and Journalistic Criteria
title_fullStr Predicting the Relevance of Social Media Posts Based on Linguistic Features and Journalistic Criteria
title_full_unstemmed Predicting the Relevance of Social Media Posts Based on Linguistic Features and Journalistic Criteria
title_sort Predicting the Relevance of Social Media Posts Based on Linguistic Features and Journalistic Criteria
author Pinto,A
author_facet Pinto,A
Oliveira,HG
Álvaro Figueira
Alves,AO
author_role author
author2 Oliveira,HG
Álvaro Figueira
Alves,AO
author2_role author
author
author
dc.contributor.author.fl_str_mv Pinto,A
Oliveira,HG
Álvaro Figueira
Alves,AO
description An overwhelming quantity of messages is posted in social networks every minute. To make the utilization of these platforms more productive, it is imperative to filter out information that is irrelevant to the general audience, such as private messages, personal opinions or well-known facts. This work is focused on the automatic classification of public social text according to its potential relevance, from a journalistic point of view, hopefully improving the overall experience of using a social network. Our experiments were based on a set of posts with several criteria, including the journalistic relevance, assessed by human judges. To predict the latter, we rely exclusively on linguistic features, extracted by Natural Language Processing tools, regardless the author of the message and its profile information. In our first approach, different classifiers and feature engineering methods were used to predict relevance directly from the selected features. In a second approach, relevance was predicted indirectly, based on an ensemble of classifiers for other key criteria when defining relevance-controversy, interestingness, meaningfulness, novelty, reliability and scope-also in the dataset. The first approach achieved a F (1)-score of 0.76 and an Area under the ROC curve (AUC) of 0.63. But the best results were achieved by the second approach, with the best learned model achieving a F (1)-score of 0.84 with an AUC of 0.78. This confirmed that journalistic relevance can indeed be predicted by the combination of the selected criteria, and that linguistic features can be exploited to classify the latter.
publishDate 2017
dc.date.none.fl_str_mv 2017-01-01T00:00:00Z
2017
2018-01-10T10:19:39Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://repositorio.inesctec.pt/handle/123456789/5822
http://dx.doi.org/10.1007/s00354-017-0015-1
url http://repositorio.inesctec.pt/handle/123456789/5822
http://dx.doi.org/10.1007/s00354-017-0015-1
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799131603420577792