Topic Modeling

Detalhes bibliográficos
Autor(a) principal: Amaro, Ana
Data de Publicação: 2024
Outros Autores: Bação, Fernando
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10362/164320
Resumo: Amaro, A., & Bação, F. (2024). Topic Modeling: A Consistent Framework for Comparative Studies. Emerging Science Journal, 8(1), 125-139. https://doi.org/10.28991/ESJ-2024-08-01-09 --- This work was supported by a grant of the Portuguese Foundation for Science and Technology (“Fundação para a Ciência e a Tecnologia”), DSAIPA/DS/0116/2019, and project UIDB/04152/2020—Centro de Investigação em Gestão de Informação (MagIC)
id RCAP_208aef6854f7b49df7e0efd142ab1e8c
oai_identifier_str oai:run.unl.pt:10362/164320
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Topic ModelingA Consistent Framework for Comparative StudiesNatural Language ProcessingTop2VecTopic CoherenceTopic ModelingUnsupervised LearningGeneralAmaro, A., & Bação, F. (2024). Topic Modeling: A Consistent Framework for Comparative Studies. Emerging Science Journal, 8(1), 125-139. https://doi.org/10.28991/ESJ-2024-08-01-09 --- This work was supported by a grant of the Portuguese Foundation for Science and Technology (“Fundação para a Ciência e a Tecnologia”), DSAIPA/DS/0116/2019, and project UIDB/04152/2020—Centro de Investigação em Gestão de Informação (MagIC)In recent years, the field of Topic Modeling (TM) has grown in importance due to the increasing availability of digital text data. TM is an unsupervised learning technique that helps uncover latent semantic structures in large sets of documents, making it a valuable tool for finding relevant patterns. However, evaluating the performance of TM algorithms can be challenging as different metrics and datasets are often used, leading to inconsistent results. In addition, many current surveys of TM algorithms focus on a limited number of models and exclude state-of-the-art approaches. This paper has the objective of addressing these issues by presenting a comprehensive comparative study of five TM algorithms across three different benchmark datasets using five different metrics. We offer an updated survey of the latest TM approaches and evaluation metrics, providing a consistent framework for comparing different algorithms while introducing state-of-the art approaches that have been disregarded in the literature. The experiments, which primarily use Context Vectors (CV) Topic Coherence as an evaluation metric, show that Top2Vec is the best-performing model across all datasets, disrupting the tendency for Latent Dirichlet Allocation to be the best performer.NOVA Information Management School (NOVA IMS)Information Management Research Center (MagIC) - NOVA Information Management SchoolRUNAmaro, AnaBação, Fernando2024-03-01T00:27:16Z2024-02-012024-02-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article15application/pdfhttp://hdl.handle.net/10362/164320eng2610-9182PURE: 84294537https://doi.org/10.28991/ESJ-2024-08-01-09info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T05:51:56Zoai:run.unl.pt:10362/164320Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T04:00:08.473333Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Topic Modeling
A Consistent Framework for Comparative Studies
title Topic Modeling
spellingShingle Topic Modeling
Amaro, Ana
Natural Language Processing
Top2Vec
Topic Coherence
Topic Modeling
Unsupervised Learning
General
title_short Topic Modeling
title_full Topic Modeling
title_fullStr Topic Modeling
title_full_unstemmed Topic Modeling
title_sort Topic Modeling
author Amaro, Ana
author_facet Amaro, Ana
Bação, Fernando
author_role author
author2 Bação, Fernando
author2_role author
dc.contributor.none.fl_str_mv NOVA Information Management School (NOVA IMS)
Information Management Research Center (MagIC) - NOVA Information Management School
RUN
dc.contributor.author.fl_str_mv Amaro, Ana
Bação, Fernando
dc.subject.por.fl_str_mv Natural Language Processing
Top2Vec
Topic Coherence
Topic Modeling
Unsupervised Learning
General
topic Natural Language Processing
Top2Vec
Topic Coherence
Topic Modeling
Unsupervised Learning
General
description Amaro, A., & Bação, F. (2024). Topic Modeling: A Consistent Framework for Comparative Studies. Emerging Science Journal, 8(1), 125-139. https://doi.org/10.28991/ESJ-2024-08-01-09 --- This work was supported by a grant of the Portuguese Foundation for Science and Technology (“Fundação para a Ciência e a Tecnologia”), DSAIPA/DS/0116/2019, and project UIDB/04152/2020—Centro de Investigação em Gestão de Informação (MagIC)
publishDate 2024
dc.date.none.fl_str_mv 2024-03-01T00:27:16Z
2024-02-01
2024-02-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/164320
url http://hdl.handle.net/10362/164320
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 2610-9182
PURE: 84294537
https://doi.org/10.28991/ESJ-2024-08-01-09
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv 15
application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799138177206714368