On the Use of Transformer-Based Models for Intent Detection Using Clustering Algorithms
Autor(a) principal: | |
---|---|
Data de Publicação: | 2023 |
Outros Autores: | , , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10400.13/5558 |
Resumo: | Chatbots are becoming increasingly popular and require the ability to interpret natural language to provide clear communication with humans. To achieve this, intent detection is cru cial. However, current applications typically need a significant amount of annotated data, which is time-consuming and expensive to acquire. This article assesses the effectiveness of different text representations for annotating unlabeled dialog data through a pipeline that examines both classical approaches and pre-trained transformer models for word embedding. The resulting embeddings were then used to create sentence embeddings through pooling, followed by dimensionality re duction, before being fed into a clustering algorithm to determine the user’s intents. Therefore, various pooling, dimension reduction, and clustering algorithms were evaluated to determine the most appropriate approach. The evaluation dataset contains a variety of user intents across differ ent domains, with varying intent taxonomies within the same domain. Results demonstrate that transformer-based models perform better text representation than classical approaches. However, combining several clustering algorithms and embeddings from dissimilar origins through ensemble clustering considerably improves the final clustering solution. Additionally, applying the uniform manifold approximation and projection algorithm for dimension reduction can substantially improve performance (up to 20%) while using a much smaller representation. |
id |
RCAP_91054c5cbd4b14048e51b90462009406 |
---|---|
oai_identifier_str |
oai:digituma.uma.pt:10400.13/5558 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
On the Use of Transformer-Based Models for Intent Detection Using Clustering AlgorithmsBERTChatbotsEmbedding clusteringIntent detectionNatural language processingNatural language understandingRoBERTaWord and sentence embedding.Faculdade de Ciências Exatas e da EngenhariaChatbots are becoming increasingly popular and require the ability to interpret natural language to provide clear communication with humans. To achieve this, intent detection is cru cial. However, current applications typically need a significant amount of annotated data, which is time-consuming and expensive to acquire. This article assesses the effectiveness of different text representations for annotating unlabeled dialog data through a pipeline that examines both classical approaches and pre-trained transformer models for word embedding. The resulting embeddings were then used to create sentence embeddings through pooling, followed by dimensionality re duction, before being fed into a clustering algorithm to determine the user’s intents. Therefore, various pooling, dimension reduction, and clustering algorithms were evaluated to determine the most appropriate approach. The evaluation dataset contains a variety of user intents across differ ent domains, with varying intent taxonomies within the same domain. Results demonstrate that transformer-based models perform better text representation than classical approaches. However, combining several clustering algorithms and embeddings from dissimilar origins through ensemble clustering considerably improves the final clustering solution. Additionally, applying the uniform manifold approximation and projection algorithm for dimension reduction can substantially improve performance (up to 20%) while using a much smaller representation.MDPIDigitUMaMoura, AndréLima, PedroMendonça, FábioMostafa, Sheikh ShanawazDias, Fernando Morgado2024-02-16T11:41:06Z20232023-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10400.13/5558engMoura, A.; Lima, P.; Mendonça, F.; Mostafa, S.S.; Dias, F. M. On the Use of Transformer-Based Models for Intent Detection Using Clustering Algorithms. Appl. Sci. 2023, 13, 5178. https://doi.org/10.3390/app1308517810.3390/app13085178info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-02-18T05:33:24Zoai:digituma.uma.pt:10400.13/5558Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T02:38:50.254735Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
On the Use of Transformer-Based Models for Intent Detection Using Clustering Algorithms |
title |
On the Use of Transformer-Based Models for Intent Detection Using Clustering Algorithms |
spellingShingle |
On the Use of Transformer-Based Models for Intent Detection Using Clustering Algorithms Moura, André BERT Chatbots Embedding clustering Intent detection Natural language processing Natural language understanding RoBERTa Word and sentence embedding . Faculdade de Ciências Exatas e da Engenharia |
title_short |
On the Use of Transformer-Based Models for Intent Detection Using Clustering Algorithms |
title_full |
On the Use of Transformer-Based Models for Intent Detection Using Clustering Algorithms |
title_fullStr |
On the Use of Transformer-Based Models for Intent Detection Using Clustering Algorithms |
title_full_unstemmed |
On the Use of Transformer-Based Models for Intent Detection Using Clustering Algorithms |
title_sort |
On the Use of Transformer-Based Models for Intent Detection Using Clustering Algorithms |
author |
Moura, André |
author_facet |
Moura, André Lima, Pedro Mendonça, Fábio Mostafa, Sheikh Shanawaz Dias, Fernando Morgado |
author_role |
author |
author2 |
Lima, Pedro Mendonça, Fábio Mostafa, Sheikh Shanawaz Dias, Fernando Morgado |
author2_role |
author author author author |
dc.contributor.none.fl_str_mv |
DigitUMa |
dc.contributor.author.fl_str_mv |
Moura, André Lima, Pedro Mendonça, Fábio Mostafa, Sheikh Shanawaz Dias, Fernando Morgado |
dc.subject.por.fl_str_mv |
BERT Chatbots Embedding clustering Intent detection Natural language processing Natural language understanding RoBERTa Word and sentence embedding . Faculdade de Ciências Exatas e da Engenharia |
topic |
BERT Chatbots Embedding clustering Intent detection Natural language processing Natural language understanding RoBERTa Word and sentence embedding . Faculdade de Ciências Exatas e da Engenharia |
description |
Chatbots are becoming increasingly popular and require the ability to interpret natural language to provide clear communication with humans. To achieve this, intent detection is cru cial. However, current applications typically need a significant amount of annotated data, which is time-consuming and expensive to acquire. This article assesses the effectiveness of different text representations for annotating unlabeled dialog data through a pipeline that examines both classical approaches and pre-trained transformer models for word embedding. The resulting embeddings were then used to create sentence embeddings through pooling, followed by dimensionality re duction, before being fed into a clustering algorithm to determine the user’s intents. Therefore, various pooling, dimension reduction, and clustering algorithms were evaluated to determine the most appropriate approach. The evaluation dataset contains a variety of user intents across differ ent domains, with varying intent taxonomies within the same domain. Results demonstrate that transformer-based models perform better text representation than classical approaches. However, combining several clustering algorithms and embeddings from dissimilar origins through ensemble clustering considerably improves the final clustering solution. Additionally, applying the uniform manifold approximation and projection algorithm for dimension reduction can substantially improve performance (up to 20%) while using a much smaller representation. |
publishDate |
2023 |
dc.date.none.fl_str_mv |
2023 2023-01-01T00:00:00Z 2024-02-16T11:41:06Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10400.13/5558 |
url |
http://hdl.handle.net/10400.13/5558 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
Moura, A.; Lima, P.; Mendonça, F.; Mostafa, S.S.; Dias, F. M. On the Use of Transformer-Based Models for Intent Detection Using Clustering Algorithms. Appl. Sci. 2023, 13, 5178. https://doi.org/10.3390/app13085178 10.3390/app13085178 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
MDPI |
publisher.none.fl_str_mv |
MDPI |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1817553188258054144 |