Predicting the Relevance of Social Media Posts Based on Linguistic Features and Journalistic Criteria
Autor(a) principal: | |
---|---|
Data de Publicação: | 2017 |
Outros Autores: | , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://repositorio.inesctec.pt/handle/123456789/5822 http://dx.doi.org/10.1007/s00354-017-0015-1 |
Resumo: | An overwhelming quantity of messages is posted in social networks every minute. To make the utilization of these platforms more productive, it is imperative to filter out information that is irrelevant to the general audience, such as private messages, personal opinions or well-known facts. This work is focused on the automatic classification of public social text according to its potential relevance, from a journalistic point of view, hopefully improving the overall experience of using a social network. Our experiments were based on a set of posts with several criteria, including the journalistic relevance, assessed by human judges. To predict the latter, we rely exclusively on linguistic features, extracted by Natural Language Processing tools, regardless the author of the message and its profile information. In our first approach, different classifiers and feature engineering methods were used to predict relevance directly from the selected features. In a second approach, relevance was predicted indirectly, based on an ensemble of classifiers for other key criteria when defining relevance-controversy, interestingness, meaningfulness, novelty, reliability and scope-also in the dataset. The first approach achieved a F (1)-score of 0.76 and an Area under the ROC curve (AUC) of 0.63. But the best results were achieved by the second approach, with the best learned model achieving a F (1)-score of 0.84 with an AUC of 0.78. This confirmed that journalistic relevance can indeed be predicted by the combination of the selected criteria, and that linguistic features can be exploited to classify the latter. |
id |
RCAP_d8aee103fe31696627cd705288082ee5 |
---|---|
oai_identifier_str |
oai:repositorio.inesctec.pt:123456789/5822 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Predicting the Relevance of Social Media Posts Based on Linguistic Features and Journalistic CriteriaAn overwhelming quantity of messages is posted in social networks every minute. To make the utilization of these platforms more productive, it is imperative to filter out information that is irrelevant to the general audience, such as private messages, personal opinions or well-known facts. This work is focused on the automatic classification of public social text according to its potential relevance, from a journalistic point of view, hopefully improving the overall experience of using a social network. Our experiments were based on a set of posts with several criteria, including the journalistic relevance, assessed by human judges. To predict the latter, we rely exclusively on linguistic features, extracted by Natural Language Processing tools, regardless the author of the message and its profile information. In our first approach, different classifiers and feature engineering methods were used to predict relevance directly from the selected features. In a second approach, relevance was predicted indirectly, based on an ensemble of classifiers for other key criteria when defining relevance-controversy, interestingness, meaningfulness, novelty, reliability and scope-also in the dataset. The first approach achieved a F (1)-score of 0.76 and an Area under the ROC curve (AUC) of 0.63. But the best results were achieved by the second approach, with the best learned model achieving a F (1)-score of 0.84 with an AUC of 0.78. This confirmed that journalistic relevance can indeed be predicted by the combination of the selected criteria, and that linguistic features can be exploited to classify the latter.2018-01-10T10:19:39Z2017-01-01T00:00:00Z2017info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://repositorio.inesctec.pt/handle/123456789/5822http://dx.doi.org/10.1007/s00354-017-0015-1engPinto,AOliveira,HGÁlvaro FigueiraAlves,AOinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-05-15T10:20:12Zoai:repositorio.inesctec.pt:123456789/5822Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T17:52:48.781392Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Predicting the Relevance of Social Media Posts Based on Linguistic Features and Journalistic Criteria |
title |
Predicting the Relevance of Social Media Posts Based on Linguistic Features and Journalistic Criteria |
spellingShingle |
Predicting the Relevance of Social Media Posts Based on Linguistic Features and Journalistic Criteria Pinto,A |
title_short |
Predicting the Relevance of Social Media Posts Based on Linguistic Features and Journalistic Criteria |
title_full |
Predicting the Relevance of Social Media Posts Based on Linguistic Features and Journalistic Criteria |
title_fullStr |
Predicting the Relevance of Social Media Posts Based on Linguistic Features and Journalistic Criteria |
title_full_unstemmed |
Predicting the Relevance of Social Media Posts Based on Linguistic Features and Journalistic Criteria |
title_sort |
Predicting the Relevance of Social Media Posts Based on Linguistic Features and Journalistic Criteria |
author |
Pinto,A |
author_facet |
Pinto,A Oliveira,HG Álvaro Figueira Alves,AO |
author_role |
author |
author2 |
Oliveira,HG Álvaro Figueira Alves,AO |
author2_role |
author author author |
dc.contributor.author.fl_str_mv |
Pinto,A Oliveira,HG Álvaro Figueira Alves,AO |
description |
An overwhelming quantity of messages is posted in social networks every minute. To make the utilization of these platforms more productive, it is imperative to filter out information that is irrelevant to the general audience, such as private messages, personal opinions or well-known facts. This work is focused on the automatic classification of public social text according to its potential relevance, from a journalistic point of view, hopefully improving the overall experience of using a social network. Our experiments were based on a set of posts with several criteria, including the journalistic relevance, assessed by human judges. To predict the latter, we rely exclusively on linguistic features, extracted by Natural Language Processing tools, regardless the author of the message and its profile information. In our first approach, different classifiers and feature engineering methods were used to predict relevance directly from the selected features. In a second approach, relevance was predicted indirectly, based on an ensemble of classifiers for other key criteria when defining relevance-controversy, interestingness, meaningfulness, novelty, reliability and scope-also in the dataset. The first approach achieved a F (1)-score of 0.76 and an Area under the ROC curve (AUC) of 0.63. But the best results were achieved by the second approach, with the best learned model achieving a F (1)-score of 0.84 with an AUC of 0.78. This confirmed that journalistic relevance can indeed be predicted by the combination of the selected criteria, and that linguistic features can be exploited to classify the latter. |
publishDate |
2017 |
dc.date.none.fl_str_mv |
2017-01-01T00:00:00Z 2017 2018-01-10T10:19:39Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://repositorio.inesctec.pt/handle/123456789/5822 http://dx.doi.org/10.1007/s00354-017-0015-1 |
url |
http://repositorio.inesctec.pt/handle/123456789/5822 http://dx.doi.org/10.1007/s00354-017-0015-1 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799131603420577792 |