Latent association rule cluster based model to extract topics for classification and recommendation applications

Santos, Fabiano Fernandes dos; Domingues, Marcos Aurelio; Sundermann, Camila Vaccari; Carvalho, Veronica Oliveira de [UNESP]; Moura, Maria Fernanda; Rezende, Solange Oliveira

Latent association rule cluster based model to extract topics for classification and recommendation applications

Detalhes bibliográficos
Autor(a) principal:	Santos, Fabiano Fernandes dos
Data de Publicação:	2018
Outros Autores:	Domingues, Marcos Aurelio, Sundermann, Camila Vaccari, Carvalho, Veronica Oliveira de [UNESP], Moura, Maria Fernanda, Rezende, Solange Oliveira
Tipo de documento:	Artigo
Idioma:	eng
Título da fonte:	Repositório Institucional da UNESP
Texto Completo:	http://dx.doi.org/10.1016/j.eswa.2018.06.021 http://hdl.handle.net/11449/186232
Resumo:	The quality of any text mining technique is highly dependent on the features that are used to represent the document collection. A classical form of document representation is the vector space model (VSM), according to which the documents are represented as vectors of weights that correspond to the features of the documents. The bag-of-words model is the most popular VSM approach due to its simplicity and general applicability, but this model does not include term dependency and has a high dimensionality. In the literature, several models for document representation have been proposed in order to capture the dependency of terms. Among them, the topic model representation is one of the most interesting approaches - since it describes the collection of documents in a way that reveals their internal structure and the interrelationships therein, and also provides a dimensionality reduction. However, even for topic models, the efficient extraction of information concerning the relations among terms for document representation is still a major research challenge. In order to address this issue, we proposed thelatent association rule cluster based model (LARCM). The LARCM is a non-probabilistic topic model that makes use of association rule clustering to build a document representation with low dimensionality in such a way that each feature (i.e., topic) is comprised of information concerning relations among the terms. We evaluated the interpretability of the topics obtained by using our proposed model against the ones provided by the traditional latent dirichlet allocation (LDA) model and the LDA model using a document representation that includes correlated terms (i.e., bag-of-related-words). The experimental results indicated that the LARCM provides topics with better interpretability than the LDA models. Additionally, we used the topics obtained by the LARCM in two different applications: text classification and page recommendation. With respect to text classification, the topics were used to improve document collection representation. Concerning page recommendation, topics were used as contextual information in context aware recommender systems. Results have shown that the topics provided by the LARCM can be used to improve both applications. (C) 2018 Elsevier Ltd. All rights reserved.

Metadados do item

id	UNSP_966bd50e86e50e660e05090bbd2f6471
oai_identifier_str	oai:repositorio.unesp.br:11449/186232
network_acronym_str	UNSP
network_name_str	Repositório Institucional da UNESP
repository_id_str	2946
spelling	Latent association rule cluster based model to extract topics for classification and recommendation applicationsDocument representationTopic modelAssociation rulesClusteringText classificationContext-aware recommender systemsThe quality of any text mining technique is highly dependent on the features that are used to represent the document collection. A classical form of document representation is the vector space model (VSM), according to which the documents are represented as vectors of weights that correspond to the features of the documents. The bag-of-words model is the most popular VSM approach due to its simplicity and general applicability, but this model does not include term dependency and has a high dimensionality. In the literature, several models for document representation have been proposed in order to capture the dependency of terms. Among them, the topic model representation is one of the most interesting approaches - since it describes the collection of documents in a way that reveals their internal structure and the interrelationships therein, and also provides a dimensionality reduction. However, even for topic models, the efficient extraction of information concerning the relations among terms for document representation is still a major research challenge. In order to address this issue, we proposed thelatent association rule cluster based model (LARCM). The LARCM is a non-probabilistic topic model that makes use of association rule clustering to build a document representation with low dimensionality in such a way that each feature (i.e., topic) is comprised of information concerning relations among the terms. We evaluated the interpretability of the topics obtained by using our proposed model against the ones provided by the traditional latent dirichlet allocation (LDA) model and the LDA model using a document representation that includes correlated terms (i.e., bag-of-related-words). The experimental results indicated that the LARCM provides topics with better interpretability than the LDA models. Additionally, we used the topics obtained by the LARCM in two different applications: text classification and page recommendation. With respect to text classification, the topics were used to improve document collection representation. Concerning page recommendation, topics were used as contextual information in context aware recommender systems. Results have shown that the topics provided by the LARCM can be used to improve both applications. (C) 2018 Elsevier Ltd. All rights reserved.Araucaria Foundation (Parana/Brazil)Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)Univ Sao Paulo, Inst Math & Comp Sci, Ave Trabalhador Sao Carlense 400, BR-13566590 Sao Carlos, SP, BrazilUniv Estadual Maringa, Dept Informat, Ave Colombo, BR-87020900 Maringa, Parana, BrazilState Univ Sao Paulo, Inst Geosci & Exact Sci, 24 A, BR-13506900 Rio Claro, SP, BrazilEmbrapa Agr Informat, Ave Dr Andre Tosello, BR-13083886 Campinas, SP, BrazilState Univ Sao Paulo, Inst Geosci & Exact Sci, 24 A, BR-13506900 Rio Claro, SP, BrazilElsevier B.V.Universidade de São Paulo (USP)Universidade Estadual de Maringá (UEM)Universidade Estadual Paulista (Unesp)Empresa Brasileira de Pesquisa Agropecuária (EMBRAPA)Santos, Fabiano Fernandes dosDomingues, Marcos AurelioSundermann, Camila VaccariCarvalho, Veronica Oliveira de [UNESP]Moura, Maria FernandaRezende, Solange Oliveira2019-10-04T13:28:17Z2019-10-04T13:28:17Z2018-12-01info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article34-60http://dx.doi.org/10.1016/j.eswa.2018.06.021Expert Systems With Applications. Oxford: Pergamon-elsevier Science Ltd, v. 112, p. 34-60, 2018.0957-4174http://hdl.handle.net/11449/18623210.1016/j.eswa.2018.06.021WOS:000442708600003Web of Sciencereponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESPengExpert Systems With Applicationsinfo:eu-repo/semantics/openAccess2021-10-22T19:03:47Zoai:repositorio.unesp.br:11449/186232Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestopendoar:29462024-08-05T14:19:34.252983Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false
dc.title.none.fl_str_mv	Latent association rule cluster based model to extract topics for classification and recommendation applications
title	Latent association rule cluster based model to extract topics for classification and recommendation applications
spellingShingle	Latent association rule cluster based model to extract topics for classification and recommendation applications Santos, Fabiano Fernandes dos Document representation Topic model Association rules Clustering Text classification Context-aware recommender systems
title_short	Latent association rule cluster based model to extract topics for classification and recommendation applications
title_full	Latent association rule cluster based model to extract topics for classification and recommendation applications
title_fullStr	Latent association rule cluster based model to extract topics for classification and recommendation applications
title_full_unstemmed	Latent association rule cluster based model to extract topics for classification and recommendation applications
title_sort	Latent association rule cluster based model to extract topics for classification and recommendation applications
author	Santos, Fabiano Fernandes dos
author_facet	Santos, Fabiano Fernandes dos Domingues, Marcos Aurelio Sundermann, Camila Vaccari Carvalho, Veronica Oliveira de [UNESP] Moura, Maria Fernanda Rezende, Solange Oliveira
author_role	author
author2	Domingues, Marcos Aurelio Sundermann, Camila Vaccari Carvalho, Veronica Oliveira de [UNESP] Moura, Maria Fernanda Rezende, Solange Oliveira
author2_role	author author author author author
dc.contributor.none.fl_str_mv	Universidade de São Paulo (USP) Universidade Estadual de Maringá (UEM) Universidade Estadual Paulista (Unesp) Empresa Brasileira de Pesquisa Agropecuária (EMBRAPA)
dc.contributor.author.fl_str_mv	Santos, Fabiano Fernandes dos Domingues, Marcos Aurelio Sundermann, Camila Vaccari Carvalho, Veronica Oliveira de [UNESP] Moura, Maria Fernanda Rezende, Solange Oliveira
dc.subject.por.fl_str_mv	Document representation Topic model Association rules Clustering Text classification Context-aware recommender systems
topic	Document representation Topic model Association rules Clustering Text classification Context-aware recommender systems
description	The quality of any text mining technique is highly dependent on the features that are used to represent the document collection. A classical form of document representation is the vector space model (VSM), according to which the documents are represented as vectors of weights that correspond to the features of the documents. The bag-of-words model is the most popular VSM approach due to its simplicity and general applicability, but this model does not include term dependency and has a high dimensionality. In the literature, several models for document representation have been proposed in order to capture the dependency of terms. Among them, the topic model representation is one of the most interesting approaches - since it describes the collection of documents in a way that reveals their internal structure and the interrelationships therein, and also provides a dimensionality reduction. However, even for topic models, the efficient extraction of information concerning the relations among terms for document representation is still a major research challenge. In order to address this issue, we proposed thelatent association rule cluster based model (LARCM). The LARCM is a non-probabilistic topic model that makes use of association rule clustering to build a document representation with low dimensionality in such a way that each feature (i.e., topic) is comprised of information concerning relations among the terms. We evaluated the interpretability of the topics obtained by using our proposed model against the ones provided by the traditional latent dirichlet allocation (LDA) model and the LDA model using a document representation that includes correlated terms (i.e., bag-of-related-words). The experimental results indicated that the LARCM provides topics with better interpretability than the LDA models. Additionally, we used the topics obtained by the LARCM in two different applications: text classification and page recommendation. With respect to text classification, the topics were used to improve document collection representation. Concerning page recommendation, topics were used as contextual information in context aware recommender systems. Results have shown that the topics provided by the LARCM can be used to improve both applications. (C) 2018 Elsevier Ltd. All rights reserved.
publishDate	2018
dc.date.none.fl_str_mv	2018-12-01 2019-10-04T13:28:17Z 2019-10-04T13:28:17Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://dx.doi.org/10.1016/j.eswa.2018.06.021 Expert Systems With Applications. Oxford: Pergamon-elsevier Science Ltd, v. 112, p. 34-60, 2018. 0957-4174 http://hdl.handle.net/11449/186232 10.1016/j.eswa.2018.06.021 WOS:000442708600003
url	http://dx.doi.org/10.1016/j.eswa.2018.06.021 http://hdl.handle.net/11449/186232
identifier_str_mv	Expert Systems With Applications. Oxford: Pergamon-elsevier Science Ltd, v. 112, p. 34-60, 2018. 0957-4174 10.1016/j.eswa.2018.06.021 WOS:000442708600003
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	Expert Systems With Applications
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	34-60
dc.publisher.none.fl_str_mv	Elsevier B.V.
publisher.none.fl_str_mv	Elsevier B.V.
dc.source.none.fl_str_mv	Web of Science reponame:Repositório Institucional da UNESP instname:Universidade Estadual Paulista (UNESP) instacron:UNESP
instname_str	Universidade Estadual Paulista (UNESP)
instacron_str	UNESP
institution	UNESP
reponame_str	Repositório Institucional da UNESP
collection	Repositório Institucional da UNESP
repository.name.fl_str_mv	Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)
repository.mail.fl_str_mv
_version_	1808128219072692224

Latent association rule cluster based model to extract topics for classification and recommendation applications

Registros relacionados