Empowering deaf-hearing communication: Exploring synergies between predictive and generative AI-based strategies towards (Portuguese) sign language interpretation
Autor(a) principal: | |
---|---|
Data de Publicação: | 2023 |
Outros Autores: | , , , , , , , , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | https://hdl.handle.net/1822/88997 |
Resumo: | Communication between Deaf and hearing individuals remains a persistent challenge requiring attention to foster inclusivity. Despite notable efforts in the development of digital solutions for sign language recognition (SLR), several issues persist, such as cross-platform interoperability and strategies for tokenizing signs to enable continuous conversations and coherent sentence construction. To address such issues, this paper proposes a non-invasive Portuguese Sign Language (Língua Gestual Portuguesa or LGP) interpretation system-as-a-service, leveraging skeletal posture sequence inference powered by long-short term memory (LSTM) architectures. To address the scarcity of examples during machine learning (ML) model training, dataset augmentation strategies are explored. Additionally, a buffer-based interaction technique is introduced to facilitate LGP terms tokenization. This technique provides real-time feedback to users, allowing them to gauge the time remaining to complete a sign, which aids in the construction of grammatically coherent sentences based on inferred terms/words. To support human-like conditioning rules for interpretation, a large language model (LLM) service is integrated. Experiments reveal that LSTM-based neural networks, trained with 50 LGP terms and subjected to data augmentation, achieved accuracy levels ranging from 80% to 95.6%. Users unanimously reported a high level of intuition when using the buffer-based interaction strategy for terms/words tokenization. Furthermore, tests with an LLM—specifically ChatGPT—demonstrated promising semantic correlation rates in generated sentences, comparable to expected sentences. |
id |
RCAP_39197c900a12fa4f61cc8104850d5989 |
---|---|
oai_identifier_str |
oai:repositorium.sdum.uminho.pt:1822/88997 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Empowering deaf-hearing communication: Exploring synergies between predictive and generative AI-based strategies towards (Portuguese) sign language interpretationDeaf-hearing communicationGenerative pre-trained transformer (GPT)InclusionLarge language models (LLM)Long-short term memory (LSTM)Machine learning (ML)Portuguese sign languageSign language recognition (SLR)Video-based motion analyticsCommunication between Deaf and hearing individuals remains a persistent challenge requiring attention to foster inclusivity. Despite notable efforts in the development of digital solutions for sign language recognition (SLR), several issues persist, such as cross-platform interoperability and strategies for tokenizing signs to enable continuous conversations and coherent sentence construction. To address such issues, this paper proposes a non-invasive Portuguese Sign Language (Língua Gestual Portuguesa or LGP) interpretation system-as-a-service, leveraging skeletal posture sequence inference powered by long-short term memory (LSTM) architectures. To address the scarcity of examples during machine learning (ML) model training, dataset augmentation strategies are explored. Additionally, a buffer-based interaction technique is introduced to facilitate LGP terms tokenization. This technique provides real-time feedback to users, allowing them to gauge the time remaining to complete a sign, which aids in the construction of grammatically coherent sentences based on inferred terms/words. To support human-like conditioning rules for interpretation, a large language model (LLM) service is integrated. Experiments reveal that LSTM-based neural networks, trained with 50 LGP terms and subjected to data augmentation, achieved accuracy levels ranging from 80% to 95.6%. Users unanimously reported a high level of intuition when using the buffer-based interaction strategy for terms/words tokenization. Furthermore, tests with an LLM—specifically ChatGPT—demonstrated promising semantic correlation rates in generated sentences, comparable to expected sentences.FCT -Fundação para a Ciência e a Tecnologia(C644866286-00000011)MDPIUniversidade do MinhoAdão, TelmoOliveira, José A.Shahrabadi, SomayehJesus, HugoFernandes, MarcoCosta, ÂngeloFerreira, VâniaGonçalves, Martinho FradeiraLopéz, Miguel A.GuevaraPeres, EmanuelMagalhães, Luís Gonzaga Mendes2023-11-012023-11-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://hdl.handle.net/1822/88997eng10.3390/jimaging9110235info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-02-24T01:26:48Zoai:repositorium.sdum.uminho.pt:1822/88997Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:11:13.063347Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Empowering deaf-hearing communication: Exploring synergies between predictive and generative AI-based strategies towards (Portuguese) sign language interpretation |
title |
Empowering deaf-hearing communication: Exploring synergies between predictive and generative AI-based strategies towards (Portuguese) sign language interpretation |
spellingShingle |
Empowering deaf-hearing communication: Exploring synergies between predictive and generative AI-based strategies towards (Portuguese) sign language interpretation Adão, Telmo Deaf-hearing communication Generative pre-trained transformer (GPT) Inclusion Large language models (LLM) Long-short term memory (LSTM) Machine learning (ML) Portuguese sign language Sign language recognition (SLR) Video-based motion analytics |
title_short |
Empowering deaf-hearing communication: Exploring synergies between predictive and generative AI-based strategies towards (Portuguese) sign language interpretation |
title_full |
Empowering deaf-hearing communication: Exploring synergies between predictive and generative AI-based strategies towards (Portuguese) sign language interpretation |
title_fullStr |
Empowering deaf-hearing communication: Exploring synergies between predictive and generative AI-based strategies towards (Portuguese) sign language interpretation |
title_full_unstemmed |
Empowering deaf-hearing communication: Exploring synergies between predictive and generative AI-based strategies towards (Portuguese) sign language interpretation |
title_sort |
Empowering deaf-hearing communication: Exploring synergies between predictive and generative AI-based strategies towards (Portuguese) sign language interpretation |
author |
Adão, Telmo |
author_facet |
Adão, Telmo Oliveira, José A. Shahrabadi, Somayeh Jesus, Hugo Fernandes, Marco Costa, Ângelo Ferreira, Vânia Gonçalves, Martinho Fradeira Lopéz, Miguel A.Guevara Peres, Emanuel Magalhães, Luís Gonzaga Mendes |
author_role |
author |
author2 |
Oliveira, José A. Shahrabadi, Somayeh Jesus, Hugo Fernandes, Marco Costa, Ângelo Ferreira, Vânia Gonçalves, Martinho Fradeira Lopéz, Miguel A.Guevara Peres, Emanuel Magalhães, Luís Gonzaga Mendes |
author2_role |
author author author author author author author author author author |
dc.contributor.none.fl_str_mv |
Universidade do Minho |
dc.contributor.author.fl_str_mv |
Adão, Telmo Oliveira, José A. Shahrabadi, Somayeh Jesus, Hugo Fernandes, Marco Costa, Ângelo Ferreira, Vânia Gonçalves, Martinho Fradeira Lopéz, Miguel A.Guevara Peres, Emanuel Magalhães, Luís Gonzaga Mendes |
dc.subject.por.fl_str_mv |
Deaf-hearing communication Generative pre-trained transformer (GPT) Inclusion Large language models (LLM) Long-short term memory (LSTM) Machine learning (ML) Portuguese sign language Sign language recognition (SLR) Video-based motion analytics |
topic |
Deaf-hearing communication Generative pre-trained transformer (GPT) Inclusion Large language models (LLM) Long-short term memory (LSTM) Machine learning (ML) Portuguese sign language Sign language recognition (SLR) Video-based motion analytics |
description |
Communication between Deaf and hearing individuals remains a persistent challenge requiring attention to foster inclusivity. Despite notable efforts in the development of digital solutions for sign language recognition (SLR), several issues persist, such as cross-platform interoperability and strategies for tokenizing signs to enable continuous conversations and coherent sentence construction. To address such issues, this paper proposes a non-invasive Portuguese Sign Language (Língua Gestual Portuguesa or LGP) interpretation system-as-a-service, leveraging skeletal posture sequence inference powered by long-short term memory (LSTM) architectures. To address the scarcity of examples during machine learning (ML) model training, dataset augmentation strategies are explored. Additionally, a buffer-based interaction technique is introduced to facilitate LGP terms tokenization. This technique provides real-time feedback to users, allowing them to gauge the time remaining to complete a sign, which aids in the construction of grammatically coherent sentences based on inferred terms/words. To support human-like conditioning rules for interpretation, a large language model (LLM) service is integrated. Experiments reveal that LSTM-based neural networks, trained with 50 LGP terms and subjected to data augmentation, achieved accuracy levels ranging from 80% to 95.6%. Users unanimously reported a high level of intuition when using the buffer-based interaction strategy for terms/words tokenization. Furthermore, tests with an LLM—specifically ChatGPT—demonstrated promising semantic correlation rates in generated sentences, comparable to expected sentences. |
publishDate |
2023 |
dc.date.none.fl_str_mv |
2023-11-01 2023-11-01T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://hdl.handle.net/1822/88997 |
url |
https://hdl.handle.net/1822/88997 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
10.3390/jimaging9110235 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
MDPI |
publisher.none.fl_str_mv |
MDPI |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799137762547335168 |