MN-DS: A Multilabeled News Dataset for News Articles Hierarchical Classification

Detalhes bibliográficos
Autor(a) principal: Petukhova, Alina
Data de Publicação: 2023
Outros Autores: Fachada, Nuno
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: https://doi.org/10.3390/data8050074
http://hdl.handle.net/10437/13813
Resumo: This article presents a dataset of 10,917 news articles with hierarchical news categories collected between 1 January 2019 and 31 December 2019. We manually labeled the articles based on a hierarchical taxonomy with 17 first-level and 109 second-level categories. This dataset can be used to train machine learning models for automatically classifying news articles by topic. This dataset can be helpful for researchers working on news structuring, classification, and predicting future events based on released news. Keywords: news dataset; text classification; NLP; media topic taxonomy
id RCAP_dcb6379e13ee6cc7562e456d0c5ebe0e
oai_identifier_str oai:recil.ensinolusofona.pt:10437/13813
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling MN-DS: A Multilabeled News Dataset for News Articles Hierarchical ClassificationRECOLHA DE DADOSNOTÍCIASPROCESSAMENTO DA LINGUAGEM NATURALCOMUNICAÇÃO SOCIALPROCESSAMENTO DE DADOSTAXONOMIAINFORMÁTICADATA COLLECTIONNEWSNATURAL LANGUAGE PROCESSINGMEDIADATA PROCESSINGTAXONOMYCOMPUTER SCIENCEThis article presents a dataset of 10,917 news articles with hierarchical news categories collected between 1 January 2019 and 31 December 2019. We manually labeled the articles based on a hierarchical taxonomy with 17 first-level and 109 second-level categories. This dataset can be used to train machine learning models for automatically classifying news articles by topic. This dataset can be helpful for researchers working on news structuring, classification, and predicting future events based on released news. Keywords: news dataset; text classification; NLP; media topic taxonomyMDPI2023-04-28T12:14:12Z2023-04-23T00:00:00Z2023-04-23info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://doi.org/10.3390/data8050074http://hdl.handle.net/10437/13813eng2306-5729Petukhova, AlinaFachada, Nunoinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-05-05T01:30:52Zoai:recil.ensinolusofona.pt:10437/13813Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T17:50:54.394656Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv MN-DS: A Multilabeled News Dataset for News Articles Hierarchical Classification
title MN-DS: A Multilabeled News Dataset for News Articles Hierarchical Classification
spellingShingle MN-DS: A Multilabeled News Dataset for News Articles Hierarchical Classification
Petukhova, Alina
RECOLHA DE DADOS
NOTÍCIAS
PROCESSAMENTO DA LINGUAGEM NATURAL
COMUNICAÇÃO SOCIAL
PROCESSAMENTO DE DADOS
TAXONOMIA
INFORMÁTICA
DATA COLLECTION
NEWS
NATURAL LANGUAGE PROCESSING
MEDIA
DATA PROCESSING
TAXONOMY
COMPUTER SCIENCE
title_short MN-DS: A Multilabeled News Dataset for News Articles Hierarchical Classification
title_full MN-DS: A Multilabeled News Dataset for News Articles Hierarchical Classification
title_fullStr MN-DS: A Multilabeled News Dataset for News Articles Hierarchical Classification
title_full_unstemmed MN-DS: A Multilabeled News Dataset for News Articles Hierarchical Classification
title_sort MN-DS: A Multilabeled News Dataset for News Articles Hierarchical Classification
author Petukhova, Alina
author_facet Petukhova, Alina
Fachada, Nuno
author_role author
author2 Fachada, Nuno
author2_role author
dc.contributor.author.fl_str_mv Petukhova, Alina
Fachada, Nuno
dc.subject.por.fl_str_mv RECOLHA DE DADOS
NOTÍCIAS
PROCESSAMENTO DA LINGUAGEM NATURAL
COMUNICAÇÃO SOCIAL
PROCESSAMENTO DE DADOS
TAXONOMIA
INFORMÁTICA
DATA COLLECTION
NEWS
NATURAL LANGUAGE PROCESSING
MEDIA
DATA PROCESSING
TAXONOMY
COMPUTER SCIENCE
topic RECOLHA DE DADOS
NOTÍCIAS
PROCESSAMENTO DA LINGUAGEM NATURAL
COMUNICAÇÃO SOCIAL
PROCESSAMENTO DE DADOS
TAXONOMIA
INFORMÁTICA
DATA COLLECTION
NEWS
NATURAL LANGUAGE PROCESSING
MEDIA
DATA PROCESSING
TAXONOMY
COMPUTER SCIENCE
description This article presents a dataset of 10,917 news articles with hierarchical news categories collected between 1 January 2019 and 31 December 2019. We manually labeled the articles based on a hierarchical taxonomy with 17 first-level and 109 second-level categories. This dataset can be used to train machine learning models for automatically classifying news articles by topic. This dataset can be helpful for researchers working on news structuring, classification, and predicting future events based on released news. Keywords: news dataset; text classification; NLP; media topic taxonomy
publishDate 2023
dc.date.none.fl_str_mv 2023-04-28T12:14:12Z
2023-04-23T00:00:00Z
2023-04-23
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://doi.org/10.3390/data8050074
http://hdl.handle.net/10437/13813
url https://doi.org/10.3390/data8050074
http://hdl.handle.net/10437/13813
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 2306-5729
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv MDPI
publisher.none.fl_str_mv MDPI
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799131587669917696