Silent speech interface for an AAL scenario

Vítor, Nuno Miguel Carreira

Silent speech interface for an AAL scenario

Detalhes bibliográficos
Autor(a) principal:	Vítor, Nuno Miguel Carreira
Data de Publicação:	2016
Tipo de documento:	Dissertação
Idioma:	eng
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	http://hdl.handle.net/10773/18398
Resumo:	Since the 80's started to emerge studies regarding the audio-visual recognition of speech. However, in certain circumstances, the use of the audio information can not be considered due to noisy environments or other types of conditioning. Since than, studies started to emerge regarding visual speech recognition. With the launch of Kinect by Microsoft, which includes a RGB, depth sensor and microphone for a relatively low price compared to other cameras in its segment, permited new possibilities in the speech recognition eld. The launch of Kinect One in 2014 brought a new RGB-D camera with bigger resolution and a depth sensor with "Time of Flight" technology, more precise, witch allows to get better results and better accuracy in Visual Recognition Systems. This dissertation was developed with the Kinect One from Microsoft and has the objective of Visual Speech Recognition, especially commands, in Portuguese, said by the person that is standing in front of the camera, with the intention of controlling the VLC player, a relevant application VLC for an Ambient Assisted Living (AAL) scenario, a multimedia player, the most used in the world. The system developed in this dissertation is projected for an AAL scenario, for people with speech incapacity, noisy environments or only to improve and create a better home cinema experience, without the need for a remote control. The prototype follows a classic approach in pattern recognition, integrating features and classi ers. The adopted features were the position of the lips and chin. In therms of classi ers the Support Vector Machine (SVM), Random Forest, Sequential Minimal Optimization (SMO), AdaBoost and Naive Bayes algorithms were tested. The prototype developed in this dissertation achieved an accuracy of around 80 percent in a universe of 8 commands chosen to be the most intuitive as possible regarding the objective of this dissertation, to create a working prototype (VLC as chosen) using visual speech recognition.

Metadados do item

id	RCAP_ee437d54af7a9f035b11e2dc459e3649
oai_identifier_str	oai:ria.ua.pt:10773/18398
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	Silent speech interface for an AAL scenarioReconhecimento de padrãoReconhecimento automático da fala - Meios audiovisuaisSince the 80's started to emerge studies regarding the audio-visual recognition of speech. However, in certain circumstances, the use of the audio information can not be considered due to noisy environments or other types of conditioning. Since than, studies started to emerge regarding visual speech recognition. With the launch of Kinect by Microsoft, which includes a RGB, depth sensor and microphone for a relatively low price compared to other cameras in its segment, permited new possibilities in the speech recognition eld. The launch of Kinect One in 2014 brought a new RGB-D camera with bigger resolution and a depth sensor with "Time of Flight" technology, more precise, witch allows to get better results and better accuracy in Visual Recognition Systems. This dissertation was developed with the Kinect One from Microsoft and has the objective of Visual Speech Recognition, especially commands, in Portuguese, said by the person that is standing in front of the camera, with the intention of controlling the VLC player, a relevant application VLC for an Ambient Assisted Living (AAL) scenario, a multimedia player, the most used in the world. The system developed in this dissertation is projected for an AAL scenario, for people with speech incapacity, noisy environments or only to improve and create a better home cinema experience, without the need for a remote control. The prototype follows a classic approach in pattern recognition, integrating features and classi ers. The adopted features were the position of the lips and chin. In therms of classi ers the Support Vector Machine (SVM), Random Forest, Sequential Minimal Optimization (SMO), AdaBoost and Naive Bayes algorithms were tested. The prototype developed in this dissertation achieved an accuracy of around 80 percent in a universe of 8 commands chosen to be the most intuitive as possible regarding the objective of this dissertation, to create a working prototype (VLC as chosen) using visual speech recognition.Desde a década de 80 que começaram a surgir estudos relacionados com o reconhecimento audiovisual da fala. Contudo, chegou-se á conclusão que, em certas circunstâncias, o uso da informação áudio não poderia ser considerada devido a ambientes ruidosos ou outro tipo de condicionantes. Desde então, começaram a realizar-se estudos tendo em conta o reconhecimento visual da fala. Com o lançamento da Kinect por parte da Microsoft, que inclui camara RGB, sensor de profundidade e microfone por um custo relativamente baixo comparativamente a outras câmaras do mesmo segmento, abriu novas portas e trouxe novas possibilidades no âmbito do reconhecimento da fala. Com o lançamento da Kinect One em 2014, uma câmara com maior resolução e um sensor de profundidade com tecnologia de "tempo de voo", mais precisa, permite ainda obter melhores resultados e abrir ainda mais portas no que toca ao reconhecimento visual da fala. Esta dissertação foi desenvolvida com base na Kinect One da Microsoft e tem como objectivo o reconhecimento visual da fala, mais especificamente de comandos, em Português, ditos pela pessoa que se encontra de frente para a câmara, com o intuito de controlar o VLC, uma aplicação relevante para um cenário AAL, um player de conteúdos multimédia, o mais utilizado em todo o mundo. O sistema desenvolvido encontra-se assim projetado para uma realidade de ambiente assistido, para pessoas com dificuldades motoras ou apenas como uma ferramenta de auxílio para uma melhor experiência cinematográfica em casa sem a necessidade do uso de um controlo remoto. O protótipo segue a abordagem clássica em reconhecimento de padrões, integrando extração de features e classificação. As features adotadas no protótipo realizado foram a posição dos lábios e a posição do queixo. Em termos dos classificadores foram experimentados os algoritmos Support Vector Machine (SVM), Random Forest, Sequential Minimal Optimization (SMO), AdaBoost e Naive Bayes. O protótipo no decorrer desta dissertação demonstrou conseguir atingir taxas de reconhecimento na ordem dos 80 por cento num mundo de 8 comandos escolhidos de forma a serem o mais intuitivos possível tendo em conta o objectivo desta tese, controlar o reprodutor VLC usando reconhecimento visual da fala.Universidade de Aveiro2017-09-27T12:59:03Z2016-01-01T00:00:00Z2016info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10773/18398TID:201588102engVítor, Nuno Miguel Carreirainfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-02-22T11:34:55Zoai:ria.ua.pt:10773/18398Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T02:53:08.008431Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	Silent speech interface for an AAL scenario
title	Silent speech interface for an AAL scenario
spellingShingle	Silent speech interface for an AAL scenario Vítor, Nuno Miguel Carreira Reconhecimento de padrão Reconhecimento automático da fala - Meios audiovisuais
title_short	Silent speech interface for an AAL scenario
title_full	Silent speech interface for an AAL scenario
title_fullStr	Silent speech interface for an AAL scenario
title_full_unstemmed	Silent speech interface for an AAL scenario
title_sort	Silent speech interface for an AAL scenario
author	Vítor, Nuno Miguel Carreira
author_facet	Vítor, Nuno Miguel Carreira
author_role	author
dc.contributor.author.fl_str_mv	Vítor, Nuno Miguel Carreira
dc.subject.por.fl_str_mv	Reconhecimento de padrão Reconhecimento automático da fala - Meios audiovisuais
topic	Reconhecimento de padrão Reconhecimento automático da fala - Meios audiovisuais
description	Since the 80's started to emerge studies regarding the audio-visual recognition of speech. However, in certain circumstances, the use of the audio information can not be considered due to noisy environments or other types of conditioning. Since than, studies started to emerge regarding visual speech recognition. With the launch of Kinect by Microsoft, which includes a RGB, depth sensor and microphone for a relatively low price compared to other cameras in its segment, permited new possibilities in the speech recognition eld. The launch of Kinect One in 2014 brought a new RGB-D camera with bigger resolution and a depth sensor with "Time of Flight" technology, more precise, witch allows to get better results and better accuracy in Visual Recognition Systems. This dissertation was developed with the Kinect One from Microsoft and has the objective of Visual Speech Recognition, especially commands, in Portuguese, said by the person that is standing in front of the camera, with the intention of controlling the VLC player, a relevant application VLC for an Ambient Assisted Living (AAL) scenario, a multimedia player, the most used in the world. The system developed in this dissertation is projected for an AAL scenario, for people with speech incapacity, noisy environments or only to improve and create a better home cinema experience, without the need for a remote control. The prototype follows a classic approach in pattern recognition, integrating features and classi ers. The adopted features were the position of the lips and chin. In therms of classi ers the Support Vector Machine (SVM), Random Forest, Sequential Minimal Optimization (SMO), AdaBoost and Naive Bayes algorithms were tested. The prototype developed in this dissertation achieved an accuracy of around 80 percent in a universe of 8 commands chosen to be the most intuitive as possible regarding the objective of this dissertation, to create a working prototype (VLC as chosen) using visual speech recognition.
publishDate	2016
dc.date.none.fl_str_mv	2016-01-01T00:00:00Z 2016 2017-09-27T12:59:03Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10773/18398 TID:201588102
url	http://hdl.handle.net/10773/18398
identifier_str_mv	TID:201588102
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Universidade de Aveiro
publisher.none.fl_str_mv	Universidade de Aveiro
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799137582332772352

Silent speech interface for an AAL scenario

Registros relacionados