An Experimental Analysis of Optimal Hybrid Word Embedding Methods for Text Classification Using a Movie Review Dataset

Alagarsamy,Sandhya; James,Visumathi; Raj,Raja Soosaimarian Peter

An Experimental Analysis of Optimal Hybrid Word Embedding Methods for Text Classification Using a Movie Review Dataset

Detalhes bibliográficos
Autor(a) principal:	Alagarsamy,Sandhya
Data de Publicação:	2022
Outros Autores:	James,Visumathi, Raj,Raja Soosaimarian Peter
Tipo de documento:	Artigo
Idioma:	eng
Título da fonte:	Brazilian Archives of Biology and Technology
Texto Completo:	http://old.scielo.br/scielo.php?script=sci_arttext&pid=S1516-89132022000100617
Resumo:	Abstract Today, a wealth of data is being produced over the internet from multiple sources, giving rise to the term big data. Much big data is contributed largely in the form of text. This work focuses on text classification of movie reviews dataset using Hybrid Word Embedding (HWE) models and deriving the optimal text classification model. However, in text processing, efficient handling and processing of the words and sentences in a document plays a vital role. In traditional methods like Bag of words (BoW) semantic correlation among the words does not exist. Further, the words in a document are not always processed in order, which results in certain words not being processed at all and creating problems with data sparsity. To overcome the data sparsity problem, the proposed work applied hybrid word embedding using WordNet repository. The hybrid model is built with three word embedding methods, namely, an embedding layer, Word2Vec and GloVe, in combination with the deep learning Convolutional Neural Network (CNN). The results obtained for the movie review dataset set was compared and the optimal classification model is identified. Various metrics considered for evaluation includes Log loss, Area under Curve (AUC), Mean Reciprocal Rank (MRR), Normalized Discounted Cumulative Gain (NDCG), Mean Absolute Error (MAE), Error Rate (ERR), Mathews Correlation Coefficient (MCC), Training Accuracy, Test Accuracy, Precision, Recall and F1 score. Finally, the experimental results proved that the word2vec is derived as the optimal hybrid word embedding model for classification of chosen movie review dataset.

Metadados do item

id	TECPAR-1_f10e7ad3c8eb0289858dde831ac2330e
oai_identifier_str	oai:scielo:S1516-89132022000100617
network_acronym_str	TECPAR-1
network_name_str	Brazilian Archives of Biology and Technology
repository_id_str
spelling	An Experimental Analysis of Optimal Hybrid Word Embedding Methods for Text Classification Using a Movie Review DatasetHybridWord EmbeddingNatural Language ProcessingDeep Neural NetworkText ClassificationCNN.Abstract Today, a wealth of data is being produced over the internet from multiple sources, giving rise to the term big data. Much big data is contributed largely in the form of text. This work focuses on text classification of movie reviews dataset using Hybrid Word Embedding (HWE) models and deriving the optimal text classification model. However, in text processing, efficient handling and processing of the words and sentences in a document plays a vital role. In traditional methods like Bag of words (BoW) semantic correlation among the words does not exist. Further, the words in a document are not always processed in order, which results in certain words not being processed at all and creating problems with data sparsity. To overcome the data sparsity problem, the proposed work applied hybrid word embedding using WordNet repository. The hybrid model is built with three word embedding methods, namely, an embedding layer, Word2Vec and GloVe, in combination with the deep learning Convolutional Neural Network (CNN). The results obtained for the movie review dataset set was compared and the optimal classification model is identified. Various metrics considered for evaluation includes Log loss, Area under Curve (AUC), Mean Reciprocal Rank (MRR), Normalized Discounted Cumulative Gain (NDCG), Mean Absolute Error (MAE), Error Rate (ERR), Mathews Correlation Coefficient (MCC), Training Accuracy, Test Accuracy, Precision, Recall and F1 score. Finally, the experimental results proved that the word2vec is derived as the optimal hybrid word embedding model for classification of chosen movie review dataset.Instituto de Tecnologia do Paraná - Tecpar2022-01-01info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersiontext/htmlhttp://old.scielo.br/scielo.php?script=sci_arttext&pid=S1516-89132022000100617Brazilian Archives of Biology and Technology v.65 2022reponame:Brazilian Archives of Biology and Technologyinstname:Instituto de Tecnologia do Paraná (Tecpar)instacron:TECPAR10.1590/1678-4324-2022210830info:eu-repo/semantics/openAccessAlagarsamy,SandhyaJames,VisumathiRaj,Raja Soosaimarian Petereng2022-08-17T00:00:00Zoai:scielo:S1516-89132022000100617Revistahttps://www.scielo.br/j/babt/https://old.scielo.br/oai/scielo-oai.phpbabt@tecpar.br\|\|babt@tecpar.br1678-43241516-8913opendoar:2022-08-17T00:00Brazilian Archives of Biology and Technology - Instituto de Tecnologia do Paraná (Tecpar)false
dc.title.none.fl_str_mv	An Experimental Analysis of Optimal Hybrid Word Embedding Methods for Text Classification Using a Movie Review Dataset
title	An Experimental Analysis of Optimal Hybrid Word Embedding Methods for Text Classification Using a Movie Review Dataset
spellingShingle	An Experimental Analysis of Optimal Hybrid Word Embedding Methods for Text Classification Using a Movie Review Dataset Alagarsamy,Sandhya HybridWord Embedding Natural Language Processing Deep Neural Network Text Classification CNN.
title_short	An Experimental Analysis of Optimal Hybrid Word Embedding Methods for Text Classification Using a Movie Review Dataset
title_full	An Experimental Analysis of Optimal Hybrid Word Embedding Methods for Text Classification Using a Movie Review Dataset
title_fullStr	An Experimental Analysis of Optimal Hybrid Word Embedding Methods for Text Classification Using a Movie Review Dataset
title_full_unstemmed	An Experimental Analysis of Optimal Hybrid Word Embedding Methods for Text Classification Using a Movie Review Dataset
title_sort	An Experimental Analysis of Optimal Hybrid Word Embedding Methods for Text Classification Using a Movie Review Dataset
author	Alagarsamy,Sandhya
author_facet	Alagarsamy,Sandhya James,Visumathi Raj,Raja Soosaimarian Peter
author_role	author
author2	James,Visumathi Raj,Raja Soosaimarian Peter
author2_role	author author
dc.contributor.author.fl_str_mv	Alagarsamy,Sandhya James,Visumathi Raj,Raja Soosaimarian Peter
dc.subject.por.fl_str_mv	HybridWord Embedding Natural Language Processing Deep Neural Network Text Classification CNN.
topic	HybridWord Embedding Natural Language Processing Deep Neural Network Text Classification CNN.
description	Abstract Today, a wealth of data is being produced over the internet from multiple sources, giving rise to the term big data. Much big data is contributed largely in the form of text. This work focuses on text classification of movie reviews dataset using Hybrid Word Embedding (HWE) models and deriving the optimal text classification model. However, in text processing, efficient handling and processing of the words and sentences in a document plays a vital role. In traditional methods like Bag of words (BoW) semantic correlation among the words does not exist. Further, the words in a document are not always processed in order, which results in certain words not being processed at all and creating problems with data sparsity. To overcome the data sparsity problem, the proposed work applied hybrid word embedding using WordNet repository. The hybrid model is built with three word embedding methods, namely, an embedding layer, Word2Vec and GloVe, in combination with the deep learning Convolutional Neural Network (CNN). The results obtained for the movie review dataset set was compared and the optimal classification model is identified. Various metrics considered for evaluation includes Log loss, Area under Curve (AUC), Mean Reciprocal Rank (MRR), Normalized Discounted Cumulative Gain (NDCG), Mean Absolute Error (MAE), Error Rate (ERR), Mathews Correlation Coefficient (MCC), Training Accuracy, Test Accuracy, Precision, Recall and F1 score. Finally, the experimental results proved that the word2vec is derived as the optimal hybrid word embedding model for classification of chosen movie review dataset.
publishDate	2022
dc.date.none.fl_str_mv	2022-01-01
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://old.scielo.br/scielo.php?script=sci_arttext&pid=S1516-89132022000100617
url	http://old.scielo.br/scielo.php?script=sci_arttext&pid=S1516-89132022000100617
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	10.1590/1678-4324-2022210830
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	text/html
dc.publisher.none.fl_str_mv	Instituto de Tecnologia do Paraná - Tecpar
publisher.none.fl_str_mv	Instituto de Tecnologia do Paraná - Tecpar
dc.source.none.fl_str_mv	Brazilian Archives of Biology and Technology v.65 2022 reponame:Brazilian Archives of Biology and Technology instname:Instituto de Tecnologia do Paraná (Tecpar) instacron:TECPAR
instname_str	Instituto de Tecnologia do Paraná (Tecpar)
instacron_str	TECPAR
institution	TECPAR
reponame_str	Brazilian Archives of Biology and Technology
collection	Brazilian Archives of Biology and Technology
repository.name.fl_str_mv	Brazilian Archives of Biology and Technology - Instituto de Tecnologia do Paraná (Tecpar)
repository.mail.fl_str_mv	babt@tecpar.br\|\|babt@tecpar.br
_version_	1750318281679437824

An Experimental Analysis of Optimal Hybrid Word Embedding Methods for Text Classification Using a Movie Review Dataset

Registros relacionados