Topic Modeling
Autor(a) principal: | |
---|---|
Data de Publicação: | 2024 |
Outros Autores: | |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10362/164320 |
Resumo: | Amaro, A., & Bação, F. (2024). Topic Modeling: A Consistent Framework for Comparative Studies. Emerging Science Journal, 8(1), 125-139. https://doi.org/10.28991/ESJ-2024-08-01-09 --- This work was supported by a grant of the Portuguese Foundation for Science and Technology (“Fundação para a Ciência e a Tecnologia”), DSAIPA/DS/0116/2019, and project UIDB/04152/2020—Centro de Investigação em Gestão de Informação (MagIC) |
id |
RCAP_208aef6854f7b49df7e0efd142ab1e8c |
---|---|
oai_identifier_str |
oai:run.unl.pt:10362/164320 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Topic ModelingA Consistent Framework for Comparative StudiesNatural Language ProcessingTop2VecTopic CoherenceTopic ModelingUnsupervised LearningGeneralAmaro, A., & Bação, F. (2024). Topic Modeling: A Consistent Framework for Comparative Studies. Emerging Science Journal, 8(1), 125-139. https://doi.org/10.28991/ESJ-2024-08-01-09 --- This work was supported by a grant of the Portuguese Foundation for Science and Technology (“Fundação para a Ciência e a Tecnologia”), DSAIPA/DS/0116/2019, and project UIDB/04152/2020—Centro de Investigação em Gestão de Informação (MagIC)In recent years, the field of Topic Modeling (TM) has grown in importance due to the increasing availability of digital text data. TM is an unsupervised learning technique that helps uncover latent semantic structures in large sets of documents, making it a valuable tool for finding relevant patterns. However, evaluating the performance of TM algorithms can be challenging as different metrics and datasets are often used, leading to inconsistent results. In addition, many current surveys of TM algorithms focus on a limited number of models and exclude state-of-the-art approaches. This paper has the objective of addressing these issues by presenting a comprehensive comparative study of five TM algorithms across three different benchmark datasets using five different metrics. We offer an updated survey of the latest TM approaches and evaluation metrics, providing a consistent framework for comparing different algorithms while introducing state-of-the art approaches that have been disregarded in the literature. The experiments, which primarily use Context Vectors (CV) Topic Coherence as an evaluation metric, show that Top2Vec is the best-performing model across all datasets, disrupting the tendency for Latent Dirichlet Allocation to be the best performer.NOVA Information Management School (NOVA IMS)Information Management Research Center (MagIC) - NOVA Information Management SchoolRUNAmaro, AnaBação, Fernando2024-03-01T00:27:16Z2024-02-012024-02-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article15application/pdfhttp://hdl.handle.net/10362/164320eng2610-9182PURE: 84294537https://doi.org/10.28991/ESJ-2024-08-01-09info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T05:51:56Zoai:run.unl.pt:10362/164320Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T04:00:08.473333Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Topic Modeling A Consistent Framework for Comparative Studies |
title |
Topic Modeling |
spellingShingle |
Topic Modeling Amaro, Ana Natural Language Processing Top2Vec Topic Coherence Topic Modeling Unsupervised Learning General |
title_short |
Topic Modeling |
title_full |
Topic Modeling |
title_fullStr |
Topic Modeling |
title_full_unstemmed |
Topic Modeling |
title_sort |
Topic Modeling |
author |
Amaro, Ana |
author_facet |
Amaro, Ana Bação, Fernando |
author_role |
author |
author2 |
Bação, Fernando |
author2_role |
author |
dc.contributor.none.fl_str_mv |
NOVA Information Management School (NOVA IMS) Information Management Research Center (MagIC) - NOVA Information Management School RUN |
dc.contributor.author.fl_str_mv |
Amaro, Ana Bação, Fernando |
dc.subject.por.fl_str_mv |
Natural Language Processing Top2Vec Topic Coherence Topic Modeling Unsupervised Learning General |
topic |
Natural Language Processing Top2Vec Topic Coherence Topic Modeling Unsupervised Learning General |
description |
Amaro, A., & Bação, F. (2024). Topic Modeling: A Consistent Framework for Comparative Studies. Emerging Science Journal, 8(1), 125-139. https://doi.org/10.28991/ESJ-2024-08-01-09 --- This work was supported by a grant of the Portuguese Foundation for Science and Technology (“Fundação para a Ciência e a Tecnologia”), DSAIPA/DS/0116/2019, and project UIDB/04152/2020—Centro de Investigação em Gestão de Informação (MagIC) |
publishDate |
2024 |
dc.date.none.fl_str_mv |
2024-03-01T00:27:16Z 2024-02-01 2024-02-01T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10362/164320 |
url |
http://hdl.handle.net/10362/164320 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
2610-9182 PURE: 84294537 https://doi.org/10.28991/ESJ-2024-08-01-09 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
15 application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799138177206714368 |