Silent speech interface for an AAL scenario
Autor(a) principal: | |
---|---|
Data de Publicação: | 2016 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10773/18398 |
Resumo: | Since the 80's started to emerge studies regarding the audio-visual recognition of speech. However, in certain circumstances, the use of the audio information can not be considered due to noisy environments or other types of conditioning. Since than, studies started to emerge regarding visual speech recognition. With the launch of Kinect by Microsoft, which includes a RGB, depth sensor and microphone for a relatively low price compared to other cameras in its segment, permited new possibilities in the speech recognition eld. The launch of Kinect One in 2014 brought a new RGB-D camera with bigger resolution and a depth sensor with "Time of Flight" technology, more precise, witch allows to get better results and better accuracy in Visual Recognition Systems. This dissertation was developed with the Kinect One from Microsoft and has the objective of Visual Speech Recognition, especially commands, in Portuguese, said by the person that is standing in front of the camera, with the intention of controlling the VLC player, a relevant application VLC for an Ambient Assisted Living (AAL) scenario, a multimedia player, the most used in the world. The system developed in this dissertation is projected for an AAL scenario, for people with speech incapacity, noisy environments or only to improve and create a better home cinema experience, without the need for a remote control. The prototype follows a classic approach in pattern recognition, integrating features and classi ers. The adopted features were the position of the lips and chin. In therms of classi ers the Support Vector Machine (SVM), Random Forest, Sequential Minimal Optimization (SMO), AdaBoost and Naive Bayes algorithms were tested. The prototype developed in this dissertation achieved an accuracy of around 80 percent in a universe of 8 commands chosen to be the most intuitive as possible regarding the objective of this dissertation, to create a working prototype (VLC as chosen) using visual speech recognition. |
id |
RCAP_ee437d54af7a9f035b11e2dc459e3649 |
---|---|
oai_identifier_str |
oai:ria.ua.pt:10773/18398 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Silent speech interface for an AAL scenarioReconhecimento de padrãoReconhecimento automático da fala - Meios audiovisuaisSince the 80's started to emerge studies regarding the audio-visual recognition of speech. However, in certain circumstances, the use of the audio information can not be considered due to noisy environments or other types of conditioning. Since than, studies started to emerge regarding visual speech recognition. With the launch of Kinect by Microsoft, which includes a RGB, depth sensor and microphone for a relatively low price compared to other cameras in its segment, permited new possibilities in the speech recognition eld. The launch of Kinect One in 2014 brought a new RGB-D camera with bigger resolution and a depth sensor with "Time of Flight" technology, more precise, witch allows to get better results and better accuracy in Visual Recognition Systems. This dissertation was developed with the Kinect One from Microsoft and has the objective of Visual Speech Recognition, especially commands, in Portuguese, said by the person that is standing in front of the camera, with the intention of controlling the VLC player, a relevant application VLC for an Ambient Assisted Living (AAL) scenario, a multimedia player, the most used in the world. The system developed in this dissertation is projected for an AAL scenario, for people with speech incapacity, noisy environments or only to improve and create a better home cinema experience, without the need for a remote control. The prototype follows a classic approach in pattern recognition, integrating features and classi ers. The adopted features were the position of the lips and chin. In therms of classi ers the Support Vector Machine (SVM), Random Forest, Sequential Minimal Optimization (SMO), AdaBoost and Naive Bayes algorithms were tested. The prototype developed in this dissertation achieved an accuracy of around 80 percent in a universe of 8 commands chosen to be the most intuitive as possible regarding the objective of this dissertation, to create a working prototype (VLC as chosen) using visual speech recognition.Desde a década de 80 que começaram a surgir estudos relacionados com o reconhecimento audiovisual da fala. Contudo, chegou-se á conclusão que, em certas circunstâncias, o uso da informação áudio não poderia ser considerada devido a ambientes ruidosos ou outro tipo de condicionantes. Desde então, começaram a realizar-se estudos tendo em conta o reconhecimento visual da fala. Com o lançamento da Kinect por parte da Microsoft, que inclui camara RGB, sensor de profundidade e microfone por um custo relativamente baixo comparativamente a outras câmaras do mesmo segmento, abriu novas portas e trouxe novas possibilidades no âmbito do reconhecimento da fala. Com o lançamento da Kinect One em 2014, uma câmara com maior resolução e um sensor de profundidade com tecnologia de "tempo de voo", mais precisa, permite ainda obter melhores resultados e abrir ainda mais portas no que toca ao reconhecimento visual da fala. Esta dissertação foi desenvolvida com base na Kinect One da Microsoft e tem como objectivo o reconhecimento visual da fala, mais especificamente de comandos, em Português, ditos pela pessoa que se encontra de frente para a câmara, com o intuito de controlar o VLC, uma aplicação relevante para um cenário AAL, um player de conteúdos multimédia, o mais utilizado em todo o mundo. O sistema desenvolvido encontra-se assim projetado para uma realidade de ambiente assistido, para pessoas com dificuldades motoras ou apenas como uma ferramenta de auxílio para uma melhor experiência cinematográfica em casa sem a necessidade do uso de um controlo remoto. O protótipo segue a abordagem clássica em reconhecimento de padrões, integrando extração de features e classificação. As features adotadas no protótipo realizado foram a posição dos lábios e a posição do queixo. Em termos dos classificadores foram experimentados os algoritmos Support Vector Machine (SVM), Random Forest, Sequential Minimal Optimization (SMO), AdaBoost e Naive Bayes. O protótipo no decorrer desta dissertação demonstrou conseguir atingir taxas de reconhecimento na ordem dos 80 por cento num mundo de 8 comandos escolhidos de forma a serem o mais intuitivos possível tendo em conta o objectivo desta tese, controlar o reprodutor VLC usando reconhecimento visual da fala.Universidade de Aveiro2017-09-27T12:59:03Z2016-01-01T00:00:00Z2016info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10773/18398TID:201588102engVítor, Nuno Miguel Carreirainfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-02-22T11:34:55Zoai:ria.ua.pt:10773/18398Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T02:53:08.008431Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Silent speech interface for an AAL scenario |
title |
Silent speech interface for an AAL scenario |
spellingShingle |
Silent speech interface for an AAL scenario Vítor, Nuno Miguel Carreira Reconhecimento de padrão Reconhecimento automático da fala - Meios audiovisuais |
title_short |
Silent speech interface for an AAL scenario |
title_full |
Silent speech interface for an AAL scenario |
title_fullStr |
Silent speech interface for an AAL scenario |
title_full_unstemmed |
Silent speech interface for an AAL scenario |
title_sort |
Silent speech interface for an AAL scenario |
author |
Vítor, Nuno Miguel Carreira |
author_facet |
Vítor, Nuno Miguel Carreira |
author_role |
author |
dc.contributor.author.fl_str_mv |
Vítor, Nuno Miguel Carreira |
dc.subject.por.fl_str_mv |
Reconhecimento de padrão Reconhecimento automático da fala - Meios audiovisuais |
topic |
Reconhecimento de padrão Reconhecimento automático da fala - Meios audiovisuais |
description |
Since the 80's started to emerge studies regarding the audio-visual recognition of speech. However, in certain circumstances, the use of the audio information can not be considered due to noisy environments or other types of conditioning. Since than, studies started to emerge regarding visual speech recognition. With the launch of Kinect by Microsoft, which includes a RGB, depth sensor and microphone for a relatively low price compared to other cameras in its segment, permited new possibilities in the speech recognition eld. The launch of Kinect One in 2014 brought a new RGB-D camera with bigger resolution and a depth sensor with "Time of Flight" technology, more precise, witch allows to get better results and better accuracy in Visual Recognition Systems. This dissertation was developed with the Kinect One from Microsoft and has the objective of Visual Speech Recognition, especially commands, in Portuguese, said by the person that is standing in front of the camera, with the intention of controlling the VLC player, a relevant application VLC for an Ambient Assisted Living (AAL) scenario, a multimedia player, the most used in the world. The system developed in this dissertation is projected for an AAL scenario, for people with speech incapacity, noisy environments or only to improve and create a better home cinema experience, without the need for a remote control. The prototype follows a classic approach in pattern recognition, integrating features and classi ers. The adopted features were the position of the lips and chin. In therms of classi ers the Support Vector Machine (SVM), Random Forest, Sequential Minimal Optimization (SMO), AdaBoost and Naive Bayes algorithms were tested. The prototype developed in this dissertation achieved an accuracy of around 80 percent in a universe of 8 commands chosen to be the most intuitive as possible regarding the objective of this dissertation, to create a working prototype (VLC as chosen) using visual speech recognition. |
publishDate |
2016 |
dc.date.none.fl_str_mv |
2016-01-01T00:00:00Z 2016 2017-09-27T12:59:03Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10773/18398 TID:201588102 |
url |
http://hdl.handle.net/10773/18398 |
identifier_str_mv |
TID:201588102 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Universidade de Aveiro |
publisher.none.fl_str_mv |
Universidade de Aveiro |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799137582332772352 |