Multimodal silent speech interface based on video, depth, surface electromyography and ultrasonic Doppler: Data collection and first recognition results

Detalhes bibliográficos
Autor(a) principal: Freitas, J.
Data de Publicação: 2013
Outros Autores: Teixeira, A., Dias, M. S.
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10071/29220
Resumo: Silent Speech Interfaces use data from the speech production process, such as visual information of face movements. However, using a single modality limits the amount of available information. In this study we start to explore the use of multiple data input modalities in order to acquire a more complete representation of the speech production model. We have selected 4 non-invasive modalities – Visual data from Video and Depth, Surface Electromyography and Ultrasonic Doppler - and created a system that explores the synchronous combination of all 4, or of a subset of them, into a multimodal Silent Speech Interface (SSI). This paper describes the system design, data collection and first word recognition results. As the first acquired corpora are necessarily small for this SSI, we use for classification an example based recognition approach based on Dynamic Time Warping followed by a weighted k-Nearest Neighbor classifier. The first classification results using different vocabularies, with digits, a small set of commands related to Ambient Assisted Living and minimal nasal pairs, show that word recognition benefits can be obtained from a multimodal approach.
id RCAP_55e46d9bd1ff0a64de089543f41236c9
oai_identifier_str oai:repositorio.iscte-iul.pt:10071/29220
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Multimodal silent speech interface based on video, depth, surface electromyography and ultrasonic Doppler: Data collection and first recognition resultsSilent speech interfacesMultimodalVideo and depth informationSurface electromyographyUltrasonic doppler sensingSilent Speech Interfaces use data from the speech production process, such as visual information of face movements. However, using a single modality limits the amount of available information. In this study we start to explore the use of multiple data input modalities in order to acquire a more complete representation of the speech production model. We have selected 4 non-invasive modalities – Visual data from Video and Depth, Surface Electromyography and Ultrasonic Doppler - and created a system that explores the synchronous combination of all 4, or of a subset of them, into a multimodal Silent Speech Interface (SSI). This paper describes the system design, data collection and first word recognition results. As the first acquired corpora are necessarily small for this SSI, we use for classification an example based recognition approach based on Dynamic Time Warping followed by a weighted k-Nearest Neighbor classifier. The first classification results using different vocabularies, with digits, a small set of commands related to Ambient Assisted Living and minimal nasal pairs, show that word recognition benefits can be obtained from a multimodal approach.International Speech and Communication Association2023-08-30T14:09:41Z2013-01-01T00:00:00Z20132023-08-30T15:06:56Zconference objectinfo:eu-repo/semantics/publishedVersionapplication/pdfhttp://hdl.handle.net/10071/29220eng2308-457XFreitas, J.Teixeira, A.Dias, M. S.info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-07-07T03:21:31Zoai:repositorio.iscte-iul.pt:10071/29220Portal AgregadorONGhttps://www.rcaap.pt/oai/openairemluisa.alvim@gmail.comopendoar:71602024-07-07T03:21:31Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Multimodal silent speech interface based on video, depth, surface electromyography and ultrasonic Doppler: Data collection and first recognition results
title Multimodal silent speech interface based on video, depth, surface electromyography and ultrasonic Doppler: Data collection and first recognition results
spellingShingle Multimodal silent speech interface based on video, depth, surface electromyography and ultrasonic Doppler: Data collection and first recognition results
Freitas, J.
Silent speech interfaces
Multimodal
Video and depth information
Surface electromyography
Ultrasonic doppler sensing
title_short Multimodal silent speech interface based on video, depth, surface electromyography and ultrasonic Doppler: Data collection and first recognition results
title_full Multimodal silent speech interface based on video, depth, surface electromyography and ultrasonic Doppler: Data collection and first recognition results
title_fullStr Multimodal silent speech interface based on video, depth, surface electromyography and ultrasonic Doppler: Data collection and first recognition results
title_full_unstemmed Multimodal silent speech interface based on video, depth, surface electromyography and ultrasonic Doppler: Data collection and first recognition results
title_sort Multimodal silent speech interface based on video, depth, surface electromyography and ultrasonic Doppler: Data collection and first recognition results
author Freitas, J.
author_facet Freitas, J.
Teixeira, A.
Dias, M. S.
author_role author
author2 Teixeira, A.
Dias, M. S.
author2_role author
author
dc.contributor.author.fl_str_mv Freitas, J.
Teixeira, A.
Dias, M. S.
dc.subject.por.fl_str_mv Silent speech interfaces
Multimodal
Video and depth information
Surface electromyography
Ultrasonic doppler sensing
topic Silent speech interfaces
Multimodal
Video and depth information
Surface electromyography
Ultrasonic doppler sensing
description Silent Speech Interfaces use data from the speech production process, such as visual information of face movements. However, using a single modality limits the amount of available information. In this study we start to explore the use of multiple data input modalities in order to acquire a more complete representation of the speech production model. We have selected 4 non-invasive modalities – Visual data from Video and Depth, Surface Electromyography and Ultrasonic Doppler - and created a system that explores the synchronous combination of all 4, or of a subset of them, into a multimodal Silent Speech Interface (SSI). This paper describes the system design, data collection and first word recognition results. As the first acquired corpora are necessarily small for this SSI, we use for classification an example based recognition approach based on Dynamic Time Warping followed by a weighted k-Nearest Neighbor classifier. The first classification results using different vocabularies, with digits, a small set of commands related to Ambient Assisted Living and minimal nasal pairs, show that word recognition benefits can be obtained from a multimodal approach.
publishDate 2013
dc.date.none.fl_str_mv 2013-01-01T00:00:00Z
2013
2023-08-30T14:09:41Z
2023-08-30T15:06:56Z
dc.type.driver.fl_str_mv conference object
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10071/29220
url http://hdl.handle.net/10071/29220
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 2308-457X
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv International Speech and Communication Association
publisher.none.fl_str_mv International Speech and Communication Association
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv mluisa.alvim@gmail.com
_version_ 1817546449517281280