Empowering deaf-hearing communication: Exploring synergies between predictive and generative AI-based strategies towards (Portuguese) sign language interpretation

Detalhes bibliográficos
Autor(a) principal: Adão, Telmo
Data de Publicação: 2023
Outros Autores: Oliveira, José A., Shahrabadi, Somayeh, Jesus, Hugo, Fernandes, Marco, Costa, Ângelo, Ferreira, Vânia, Gonçalves, Martinho Fradeira, Lopéz, Miguel A.Guevara, Peres, Emanuel, Magalhães, Luís Gonzaga Mendes
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: https://hdl.handle.net/1822/88997
Resumo: Communication between Deaf and hearing individuals remains a persistent challenge requiring attention to foster inclusivity. Despite notable efforts in the development of digital solutions for sign language recognition (SLR), several issues persist, such as cross-platform interoperability and strategies for tokenizing signs to enable continuous conversations and coherent sentence construction. To address such issues, this paper proposes a non-invasive Portuguese Sign Language (Língua Gestual Portuguesa or LGP) interpretation system-as-a-service, leveraging skeletal posture sequence inference powered by long-short term memory (LSTM) architectures. To address the scarcity of examples during machine learning (ML) model training, dataset augmentation strategies are explored. Additionally, a buffer-based interaction technique is introduced to facilitate LGP terms tokenization. This technique provides real-time feedback to users, allowing them to gauge the time remaining to complete a sign, which aids in the construction of grammatically coherent sentences based on inferred terms/words. To support human-like conditioning rules for interpretation, a large language model (LLM) service is integrated. Experiments reveal that LSTM-based neural networks, trained with 50 LGP terms and subjected to data augmentation, achieved accuracy levels ranging from 80% to 95.6%. Users unanimously reported a high level of intuition when using the buffer-based interaction strategy for terms/words tokenization. Furthermore, tests with an LLM—specifically ChatGPT—demonstrated promising semantic correlation rates in generated sentences, comparable to expected sentences.
id RCAP_39197c900a12fa4f61cc8104850d5989
oai_identifier_str oai:repositorium.sdum.uminho.pt:1822/88997
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Empowering deaf-hearing communication: Exploring synergies between predictive and generative AI-based strategies towards (Portuguese) sign language interpretationDeaf-hearing communicationGenerative pre-trained transformer (GPT)InclusionLarge language models (LLM)Long-short term memory (LSTM)Machine learning (ML)Portuguese sign languageSign language recognition (SLR)Video-based motion analyticsCommunication between Deaf and hearing individuals remains a persistent challenge requiring attention to foster inclusivity. Despite notable efforts in the development of digital solutions for sign language recognition (SLR), several issues persist, such as cross-platform interoperability and strategies for tokenizing signs to enable continuous conversations and coherent sentence construction. To address such issues, this paper proposes a non-invasive Portuguese Sign Language (Língua Gestual Portuguesa or LGP) interpretation system-as-a-service, leveraging skeletal posture sequence inference powered by long-short term memory (LSTM) architectures. To address the scarcity of examples during machine learning (ML) model training, dataset augmentation strategies are explored. Additionally, a buffer-based interaction technique is introduced to facilitate LGP terms tokenization. This technique provides real-time feedback to users, allowing them to gauge the time remaining to complete a sign, which aids in the construction of grammatically coherent sentences based on inferred terms/words. To support human-like conditioning rules for interpretation, a large language model (LLM) service is integrated. Experiments reveal that LSTM-based neural networks, trained with 50 LGP terms and subjected to data augmentation, achieved accuracy levels ranging from 80% to 95.6%. Users unanimously reported a high level of intuition when using the buffer-based interaction strategy for terms/words tokenization. Furthermore, tests with an LLM—specifically ChatGPT—demonstrated promising semantic correlation rates in generated sentences, comparable to expected sentences.FCT -Fundação para a Ciência e a Tecnologia(C644866286-00000011)MDPIUniversidade do MinhoAdão, TelmoOliveira, José A.Shahrabadi, SomayehJesus, HugoFernandes, MarcoCosta, ÂngeloFerreira, VâniaGonçalves, Martinho FradeiraLopéz, Miguel A.GuevaraPeres, EmanuelMagalhães, Luís Gonzaga Mendes2023-11-012023-11-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://hdl.handle.net/1822/88997eng10.3390/jimaging9110235info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-02-24T01:26:48Zoai:repositorium.sdum.uminho.pt:1822/88997Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:11:13.063347Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Empowering deaf-hearing communication: Exploring synergies between predictive and generative AI-based strategies towards (Portuguese) sign language interpretation
title Empowering deaf-hearing communication: Exploring synergies between predictive and generative AI-based strategies towards (Portuguese) sign language interpretation
spellingShingle Empowering deaf-hearing communication: Exploring synergies between predictive and generative AI-based strategies towards (Portuguese) sign language interpretation
Adão, Telmo
Deaf-hearing communication
Generative pre-trained transformer (GPT)
Inclusion
Large language models (LLM)
Long-short term memory (LSTM)
Machine learning (ML)
Portuguese sign language
Sign language recognition (SLR)
Video-based motion analytics
title_short Empowering deaf-hearing communication: Exploring synergies between predictive and generative AI-based strategies towards (Portuguese) sign language interpretation
title_full Empowering deaf-hearing communication: Exploring synergies between predictive and generative AI-based strategies towards (Portuguese) sign language interpretation
title_fullStr Empowering deaf-hearing communication: Exploring synergies between predictive and generative AI-based strategies towards (Portuguese) sign language interpretation
title_full_unstemmed Empowering deaf-hearing communication: Exploring synergies between predictive and generative AI-based strategies towards (Portuguese) sign language interpretation
title_sort Empowering deaf-hearing communication: Exploring synergies between predictive and generative AI-based strategies towards (Portuguese) sign language interpretation
author Adão, Telmo
author_facet Adão, Telmo
Oliveira, José A.
Shahrabadi, Somayeh
Jesus, Hugo
Fernandes, Marco
Costa, Ângelo
Ferreira, Vânia
Gonçalves, Martinho Fradeira
Lopéz, Miguel A.Guevara
Peres, Emanuel
Magalhães, Luís Gonzaga Mendes
author_role author
author2 Oliveira, José A.
Shahrabadi, Somayeh
Jesus, Hugo
Fernandes, Marco
Costa, Ângelo
Ferreira, Vânia
Gonçalves, Martinho Fradeira
Lopéz, Miguel A.Guevara
Peres, Emanuel
Magalhães, Luís Gonzaga Mendes
author2_role author
author
author
author
author
author
author
author
author
author
dc.contributor.none.fl_str_mv Universidade do Minho
dc.contributor.author.fl_str_mv Adão, Telmo
Oliveira, José A.
Shahrabadi, Somayeh
Jesus, Hugo
Fernandes, Marco
Costa, Ângelo
Ferreira, Vânia
Gonçalves, Martinho Fradeira
Lopéz, Miguel A.Guevara
Peres, Emanuel
Magalhães, Luís Gonzaga Mendes
dc.subject.por.fl_str_mv Deaf-hearing communication
Generative pre-trained transformer (GPT)
Inclusion
Large language models (LLM)
Long-short term memory (LSTM)
Machine learning (ML)
Portuguese sign language
Sign language recognition (SLR)
Video-based motion analytics
topic Deaf-hearing communication
Generative pre-trained transformer (GPT)
Inclusion
Large language models (LLM)
Long-short term memory (LSTM)
Machine learning (ML)
Portuguese sign language
Sign language recognition (SLR)
Video-based motion analytics
description Communication between Deaf and hearing individuals remains a persistent challenge requiring attention to foster inclusivity. Despite notable efforts in the development of digital solutions for sign language recognition (SLR), several issues persist, such as cross-platform interoperability and strategies for tokenizing signs to enable continuous conversations and coherent sentence construction. To address such issues, this paper proposes a non-invasive Portuguese Sign Language (Língua Gestual Portuguesa or LGP) interpretation system-as-a-service, leveraging skeletal posture sequence inference powered by long-short term memory (LSTM) architectures. To address the scarcity of examples during machine learning (ML) model training, dataset augmentation strategies are explored. Additionally, a buffer-based interaction technique is introduced to facilitate LGP terms tokenization. This technique provides real-time feedback to users, allowing them to gauge the time remaining to complete a sign, which aids in the construction of grammatically coherent sentences based on inferred terms/words. To support human-like conditioning rules for interpretation, a large language model (LLM) service is integrated. Experiments reveal that LSTM-based neural networks, trained with 50 LGP terms and subjected to data augmentation, achieved accuracy levels ranging from 80% to 95.6%. Users unanimously reported a high level of intuition when using the buffer-based interaction strategy for terms/words tokenization. Furthermore, tests with an LLM—specifically ChatGPT—demonstrated promising semantic correlation rates in generated sentences, comparable to expected sentences.
publishDate 2023
dc.date.none.fl_str_mv 2023-11-01
2023-11-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://hdl.handle.net/1822/88997
url https://hdl.handle.net/1822/88997
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 10.3390/jimaging9110235
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv MDPI
publisher.none.fl_str_mv MDPI
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799137762547335168