On the Use of Transformer-Based Models for Intent Detection Using Clustering Algorithms

Detalhes bibliográficos
Autor(a) principal: Moura, André
Data de Publicação: 2023
Outros Autores: Lima, Pedro, Mendonça, Fábio, Mostafa, Sheikh Shanawaz, Dias, Fernando Morgado
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10400.13/5558
Resumo: Chatbots are becoming increasingly popular and require the ability to interpret natural language to provide clear communication with humans. To achieve this, intent detection is cru cial. However, current applications typically need a significant amount of annotated data, which is time-consuming and expensive to acquire. This article assesses the effectiveness of different text representations for annotating unlabeled dialog data through a pipeline that examines both classical approaches and pre-trained transformer models for word embedding. The resulting embeddings were then used to create sentence embeddings through pooling, followed by dimensionality re duction, before being fed into a clustering algorithm to determine the user’s intents. Therefore, various pooling, dimension reduction, and clustering algorithms were evaluated to determine the most appropriate approach. The evaluation dataset contains a variety of user intents across differ ent domains, with varying intent taxonomies within the same domain. Results demonstrate that transformer-based models perform better text representation than classical approaches. However, combining several clustering algorithms and embeddings from dissimilar origins through ensemble clustering considerably improves the final clustering solution. Additionally, applying the uniform manifold approximation and projection algorithm for dimension reduction can substantially improve performance (up to 20%) while using a much smaller representation.
id RCAP_91054c5cbd4b14048e51b90462009406
oai_identifier_str oai:digituma.uma.pt:10400.13/5558
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling On the Use of Transformer-Based Models for Intent Detection Using Clustering AlgorithmsBERTChatbotsEmbedding clusteringIntent detectionNatural language processingNatural language understandingRoBERTaWord and sentence embedding.Faculdade de Ciências Exatas e da EngenhariaChatbots are becoming increasingly popular and require the ability to interpret natural language to provide clear communication with humans. To achieve this, intent detection is cru cial. However, current applications typically need a significant amount of annotated data, which is time-consuming and expensive to acquire. This article assesses the effectiveness of different text representations for annotating unlabeled dialog data through a pipeline that examines both classical approaches and pre-trained transformer models for word embedding. The resulting embeddings were then used to create sentence embeddings through pooling, followed by dimensionality re duction, before being fed into a clustering algorithm to determine the user’s intents. Therefore, various pooling, dimension reduction, and clustering algorithms were evaluated to determine the most appropriate approach. The evaluation dataset contains a variety of user intents across differ ent domains, with varying intent taxonomies within the same domain. Results demonstrate that transformer-based models perform better text representation than classical approaches. However, combining several clustering algorithms and embeddings from dissimilar origins through ensemble clustering considerably improves the final clustering solution. Additionally, applying the uniform manifold approximation and projection algorithm for dimension reduction can substantially improve performance (up to 20%) while using a much smaller representation.MDPIDigitUMaMoura, AndréLima, PedroMendonça, FábioMostafa, Sheikh ShanawazDias, Fernando Morgado2024-02-16T11:41:06Z20232023-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10400.13/5558engMoura, A.; Lima, P.; Mendonça, F.; Mostafa, S.S.; Dias, F. M. On the Use of Transformer-Based Models for Intent Detection Using Clustering Algorithms. Appl. Sci. 2023, 13, 5178. https://doi.org/10.3390/app1308517810.3390/app13085178info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-02-18T05:33:24Zoai:digituma.uma.pt:10400.13/5558Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T02:38:50.254735Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv On the Use of Transformer-Based Models for Intent Detection Using Clustering Algorithms
title On the Use of Transformer-Based Models for Intent Detection Using Clustering Algorithms
spellingShingle On the Use of Transformer-Based Models for Intent Detection Using Clustering Algorithms
Moura, André
BERT
Chatbots
Embedding clustering
Intent detection
Natural language processing
Natural language understanding
RoBERTa
Word and sentence embedding
.
Faculdade de Ciências Exatas e da Engenharia
title_short On the Use of Transformer-Based Models for Intent Detection Using Clustering Algorithms
title_full On the Use of Transformer-Based Models for Intent Detection Using Clustering Algorithms
title_fullStr On the Use of Transformer-Based Models for Intent Detection Using Clustering Algorithms
title_full_unstemmed On the Use of Transformer-Based Models for Intent Detection Using Clustering Algorithms
title_sort On the Use of Transformer-Based Models for Intent Detection Using Clustering Algorithms
author Moura, André
author_facet Moura, André
Lima, Pedro
Mendonça, Fábio
Mostafa, Sheikh Shanawaz
Dias, Fernando Morgado
author_role author
author2 Lima, Pedro
Mendonça, Fábio
Mostafa, Sheikh Shanawaz
Dias, Fernando Morgado
author2_role author
author
author
author
dc.contributor.none.fl_str_mv DigitUMa
dc.contributor.author.fl_str_mv Moura, André
Lima, Pedro
Mendonça, Fábio
Mostafa, Sheikh Shanawaz
Dias, Fernando Morgado
dc.subject.por.fl_str_mv BERT
Chatbots
Embedding clustering
Intent detection
Natural language processing
Natural language understanding
RoBERTa
Word and sentence embedding
.
Faculdade de Ciências Exatas e da Engenharia
topic BERT
Chatbots
Embedding clustering
Intent detection
Natural language processing
Natural language understanding
RoBERTa
Word and sentence embedding
.
Faculdade de Ciências Exatas e da Engenharia
description Chatbots are becoming increasingly popular and require the ability to interpret natural language to provide clear communication with humans. To achieve this, intent detection is cru cial. However, current applications typically need a significant amount of annotated data, which is time-consuming and expensive to acquire. This article assesses the effectiveness of different text representations for annotating unlabeled dialog data through a pipeline that examines both classical approaches and pre-trained transformer models for word embedding. The resulting embeddings were then used to create sentence embeddings through pooling, followed by dimensionality re duction, before being fed into a clustering algorithm to determine the user’s intents. Therefore, various pooling, dimension reduction, and clustering algorithms were evaluated to determine the most appropriate approach. The evaluation dataset contains a variety of user intents across differ ent domains, with varying intent taxonomies within the same domain. Results demonstrate that transformer-based models perform better text representation than classical approaches. However, combining several clustering algorithms and embeddings from dissimilar origins through ensemble clustering considerably improves the final clustering solution. Additionally, applying the uniform manifold approximation and projection algorithm for dimension reduction can substantially improve performance (up to 20%) while using a much smaller representation.
publishDate 2023
dc.date.none.fl_str_mv 2023
2023-01-01T00:00:00Z
2024-02-16T11:41:06Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10400.13/5558
url http://hdl.handle.net/10400.13/5558
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv Moura, A.; Lima, P.; Mendonça, F.; Mostafa, S.S.; Dias, F. M. On the Use of Transformer-Based Models for Intent Detection Using Clustering Algorithms. Appl. Sci. 2023, 13, 5178. https://doi.org/10.3390/app13085178
10.3390/app13085178
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv MDPI
publisher.none.fl_str_mv MDPI
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799137439194808320