Multimodal silent speech interface based on video, depth, surface electromyography and ultrasonic Doppler: Data collection and first recognition results
Autor(a) principal: | |
---|---|
Data de Publicação: | 2013 |
Outros Autores: | , |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10071/29220 |
Resumo: | Silent Speech Interfaces use data from the speech production process, such as visual information of face movements. However, using a single modality limits the amount of available information. In this study we start to explore the use of multiple data input modalities in order to acquire a more complete representation of the speech production model. We have selected 4 non-invasive modalities – Visual data from Video and Depth, Surface Electromyography and Ultrasonic Doppler - and created a system that explores the synchronous combination of all 4, or of a subset of them, into a multimodal Silent Speech Interface (SSI). This paper describes the system design, data collection and first word recognition results. As the first acquired corpora are necessarily small for this SSI, we use for classification an example based recognition approach based on Dynamic Time Warping followed by a weighted k-Nearest Neighbor classifier. The first classification results using different vocabularies, with digits, a small set of commands related to Ambient Assisted Living and minimal nasal pairs, show that word recognition benefits can be obtained from a multimodal approach. |
id |
RCAP_55e46d9bd1ff0a64de089543f41236c9 |
---|---|
oai_identifier_str |
oai:repositorio.iscte-iul.pt:10071/29220 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Multimodal silent speech interface based on video, depth, surface electromyography and ultrasonic Doppler: Data collection and first recognition resultsSilent speech interfacesMultimodalVideo and depth informationSurface electromyographyUltrasonic doppler sensingSilent Speech Interfaces use data from the speech production process, such as visual information of face movements. However, using a single modality limits the amount of available information. In this study we start to explore the use of multiple data input modalities in order to acquire a more complete representation of the speech production model. We have selected 4 non-invasive modalities – Visual data from Video and Depth, Surface Electromyography and Ultrasonic Doppler - and created a system that explores the synchronous combination of all 4, or of a subset of them, into a multimodal Silent Speech Interface (SSI). This paper describes the system design, data collection and first word recognition results. As the first acquired corpora are necessarily small for this SSI, we use for classification an example based recognition approach based on Dynamic Time Warping followed by a weighted k-Nearest Neighbor classifier. The first classification results using different vocabularies, with digits, a small set of commands related to Ambient Assisted Living and minimal nasal pairs, show that word recognition benefits can be obtained from a multimodal approach.International Speech and Communication Association2023-08-30T14:09:41Z2013-01-01T00:00:00Z20132023-08-30T15:06:56Zconference objectinfo:eu-repo/semantics/publishedVersionapplication/pdfhttp://hdl.handle.net/10071/29220eng2308-457XFreitas, J.Teixeira, A.Dias, M. S.info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-07-07T03:21:31Zoai:repositorio.iscte-iul.pt:10071/29220Portal AgregadorONGhttps://www.rcaap.pt/oai/openairemluisa.alvim@gmail.comopendoar:71602024-07-07T03:21:31Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Multimodal silent speech interface based on video, depth, surface electromyography and ultrasonic Doppler: Data collection and first recognition results |
title |
Multimodal silent speech interface based on video, depth, surface electromyography and ultrasonic Doppler: Data collection and first recognition results |
spellingShingle |
Multimodal silent speech interface based on video, depth, surface electromyography and ultrasonic Doppler: Data collection and first recognition results Freitas, J. Silent speech interfaces Multimodal Video and depth information Surface electromyography Ultrasonic doppler sensing |
title_short |
Multimodal silent speech interface based on video, depth, surface electromyography and ultrasonic Doppler: Data collection and first recognition results |
title_full |
Multimodal silent speech interface based on video, depth, surface electromyography and ultrasonic Doppler: Data collection and first recognition results |
title_fullStr |
Multimodal silent speech interface based on video, depth, surface electromyography and ultrasonic Doppler: Data collection and first recognition results |
title_full_unstemmed |
Multimodal silent speech interface based on video, depth, surface electromyography and ultrasonic Doppler: Data collection and first recognition results |
title_sort |
Multimodal silent speech interface based on video, depth, surface electromyography and ultrasonic Doppler: Data collection and first recognition results |
author |
Freitas, J. |
author_facet |
Freitas, J. Teixeira, A. Dias, M. S. |
author_role |
author |
author2 |
Teixeira, A. Dias, M. S. |
author2_role |
author author |
dc.contributor.author.fl_str_mv |
Freitas, J. Teixeira, A. Dias, M. S. |
dc.subject.por.fl_str_mv |
Silent speech interfaces Multimodal Video and depth information Surface electromyography Ultrasonic doppler sensing |
topic |
Silent speech interfaces Multimodal Video and depth information Surface electromyography Ultrasonic doppler sensing |
description |
Silent Speech Interfaces use data from the speech production process, such as visual information of face movements. However, using a single modality limits the amount of available information. In this study we start to explore the use of multiple data input modalities in order to acquire a more complete representation of the speech production model. We have selected 4 non-invasive modalities – Visual data from Video and Depth, Surface Electromyography and Ultrasonic Doppler - and created a system that explores the synchronous combination of all 4, or of a subset of them, into a multimodal Silent Speech Interface (SSI). This paper describes the system design, data collection and first word recognition results. As the first acquired corpora are necessarily small for this SSI, we use for classification an example based recognition approach based on Dynamic Time Warping followed by a weighted k-Nearest Neighbor classifier. The first classification results using different vocabularies, with digits, a small set of commands related to Ambient Assisted Living and minimal nasal pairs, show that word recognition benefits can be obtained from a multimodal approach. |
publishDate |
2013 |
dc.date.none.fl_str_mv |
2013-01-01T00:00:00Z 2013 2023-08-30T14:09:41Z 2023-08-30T15:06:56Z |
dc.type.driver.fl_str_mv |
conference object |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10071/29220 |
url |
http://hdl.handle.net/10071/29220 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
2308-457X |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
International Speech and Communication Association |
publisher.none.fl_str_mv |
International Speech and Communication Association |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
mluisa.alvim@gmail.com |
_version_ |
1817546449517281280 |