Automatic speech recognition, with large vocabulary, robustness, independence of speaker and multilingual processing

Caon, Daniel Régis Sarmento

Automatic speech recognition, with large vocabulary, robustness, independence of speaker and multilingual processing

Detalhes bibliográficos
Autor(a) principal:	Caon, Daniel Régis Sarmento
Data de Publicação:	2010
Tipo de documento:	Dissertação
Idioma:	eng
Título da fonte:	Repositório Institucional da Universidade Federal do Espírito Santo (riUfes)
Texto Completo:	http://repositorio.ufes.br/handle/10/6390
Resumo:	This work aims to provide automatic cognitive assistance via speech interface, to the elderly who live alone, at risk situation. Distress expressions and voice commands are part of the target vocabulary for speech recognition. Throughout the work, the large vocabulary continuous speech recognition system Julius is used in conjunction with the Hidden Markov Model Toolkit (HTK). The system Julius has its main features described, including its modification. This modification is part of the contribution which is in this work, including the detection of distress expressions ( situations of speech which suggest emergency). Four different languages were provided as target for recognition: French, Dutch, Spanish and English. In this same sequence of languages (determined by data availability and the local of scenarios for the integration of systems) theoretical studies and experiments were conducted to solve the need of working with each new configuration. This work includes studies of the French and Dutch languages. Initial experiments (in French) were made with adaptation of hidden Markov models and were analyzed by cross validation. In order to perform a new demonstration in Dutch, acoustic and language models were built and the system was integrated with other auxiliary modules (such as voice activity detector and the dialogue system). Results of speech recognition after acoustic adaptation to a specific speaker (and the creation of language models for a specific scenario to demonstrate the system) showed 86.39 % accuracy rate of sentence for the Dutch acoustic models. The same data shows 94.44 % semantical accuracy rate of sentence.

Metadados do item

id	UFES_3cd016becab9a444f1e80e47509b4618
oai_identifier_str	oai:repositorio.ufes.br:10/6390
network_acronym_str	UFES
network_name_str	Repositório Institucional da Universidade Federal do Espírito Santo (riUfes)
repository_id_str	2108
spelling	Andreão, Rodrigo VarejãoRauber, Thomas WalterCaon, Daniel Régis SarmentoVarejão, Flávio MiguelYnoguti, Carlos Alberto2016-12-23T14:33:42Z2011-03-232016-12-23T14:33:42Z2010-08-27This work aims to provide automatic cognitive assistance via speech interface, to the elderly who live alone, at risk situation. Distress expressions and voice commands are part of the target vocabulary for speech recognition. Throughout the work, the large vocabulary continuous speech recognition system Julius is used in conjunction with the Hidden Markov Model Toolkit (HTK). The system Julius has its main features described, including its modification. This modification is part of the contribution which is in this work, including the detection of distress expressions ( situations of speech which suggest emergency). Four different languages were provided as target for recognition: French, Dutch, Spanish and English. In this same sequence of languages (determined by data availability and the local of scenarios for the integration of systems) theoretical studies and experiments were conducted to solve the need of working with each new configuration. This work includes studies of the French and Dutch languages. Initial experiments (in French) were made with adaptation of hidden Markov models and were analyzed by cross validation. In order to perform a new demonstration in Dutch, acoustic and language models were built and the system was integrated with other auxiliary modules (such as voice activity detector and the dialogue system). Results of speech recognition after acoustic adaptation to a specific speaker (and the creation of language models for a specific scenario to demonstrate the system) showed 86.39 % accuracy rate of sentence for the Dutch acoustic models. The same data shows 94.44 % semantical accuracy rate of sentence.Este trabalho visa prover assistência cognitiva automática via interface de fala, à idosos que moram sozinhos, em situação de risco. Expressões de angústia e comandos vocais fazem parte do vocabulário alvo de reconhecimento de fala. Durante todo o trabalho, o sistema de reconhecimento de fala contínua de grande vocabulário Julius é utilizado em conjunto com o Hidden Markov Model Toolkit(HTK). O sistema Julius tem suas principais características descritas, tendo inclusive sido modificado. Tal modificação é parte da contribuição desse estudo, assim como a detecção de expressões de angústia (situações de fala que caracterizam emergência). Quatro diferentes linguas foram previstas como alvo de reconhecimento: Francês, Holandês, Espanhol e Inglês. Nessa mesma ordem de linguas (determinadas pela disponibilidade de dados e local de cenários de integração de sistemas) os estudos teóricos e experimentos foram conduzidos para suprir a necessidade de trabalhar com cada nova configuração. Este trabalho inclui estudos feitos com as linguas Francês e Holandês. Experimentos iniciais (em Francês) foram feitos com adaptação de modelos ocultos de Markov e analisados por validação cruzada. Para realizar uma nova demonstração em Holandês, modelos acústicos e de linguagem foram construídos e o sistema foi integrado a outros módulos auxiliares (como o detector de atividades vocais e sistema de diálogo). Resultados de reconhecimento de fala após adaptação dos modelos acústicos à um locutor específico (e da criação de modelos de linguagem específicos para um cenário de demonstração do sistema) demonstraram 86,39% de taxa de acerto de sentença para os modelos acústicos holandeses. Os mesmos dados demonstram 94,44% de taxa de acerto semântico de sentença.TextCAON, Daniel Régis Sarmento. Automatic speech recognition, with large vocabulary, robustness, independence of speaker and multilingual processing. 2010. 70 f. Dissertação (Mestrado em Informática) - Universidade Federal do Espírito Santo, Centro Tecnológico, Vitória, 2010.http://repositorio.ufes.br/handle/10/6390engUniversidade Federal do Espírito SantoMestrado em InformáticaPrograma de Pós-Graduação em InformáticaUFESBRCentro TecnológicoAutomatic speech recognitionHidden Markov modelsAcoustic modelingHTKJuliusK-FoldProcessamento de sinais de falaModelos ocultos de MarkovModelagem acústicaProcessamento de sinaisInterfaces de usuário (Sistema de computador)Reconhecimento automático da vozSistemas de reconhecimento de padrõesCiência da Computação004Automatic speech recognition, with large vocabulary, robustness, independence of speaker and multilingual processinginfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da Universidade Federal do Espírito Santo (riUfes)instname:Universidade Federal do Espírito Santo (UFES)instacron:UFESORIGINALDissertacao de Daniel Regis Sarmento Caon.pdfapplication/pdf1566094http://repositorio.ufes.br/bitstreams/c0ea455c-a9de-425a-9609-f6346ea82bc8/download67b557539f4bc5b354bc90066e805215MD5110/63902024-07-17 17:00:53.842oai:repositorio.ufes.br:10/6390http://repositorio.ufes.brRepositório InstitucionalPUBhttp://repositorio.ufes.br/oai/requestopendoar:21082024-10-15T17:53:36.948263Repositório Institucional da Universidade Federal do Espírito Santo (riUfes) - Universidade Federal do Espírito Santo (UFES)false
dc.title.none.fl_str_mv	Automatic speech recognition, with large vocabulary, robustness, independence of speaker and multilingual processing
title	Automatic speech recognition, with large vocabulary, robustness, independence of speaker and multilingual processing
spellingShingle	Automatic speech recognition, with large vocabulary, robustness, independence of speaker and multilingual processing Caon, Daniel Régis Sarmento Automatic speech recognition Hidden Markov models Acoustic modeling HTK Julius K-Fold Processamento de sinais de fala Modelos ocultos de Markov Modelagem acústica Ciência da Computação Processamento de sinais Interfaces de usuário (Sistema de computador) Reconhecimento automático da voz Sistemas de reconhecimento de padrões 004
title_short	Automatic speech recognition, with large vocabulary, robustness, independence of speaker and multilingual processing
title_full	Automatic speech recognition, with large vocabulary, robustness, independence of speaker and multilingual processing
title_fullStr	Automatic speech recognition, with large vocabulary, robustness, independence of speaker and multilingual processing
title_full_unstemmed	Automatic speech recognition, with large vocabulary, robustness, independence of speaker and multilingual processing
title_sort	Automatic speech recognition, with large vocabulary, robustness, independence of speaker and multilingual processing
author	Caon, Daniel Régis Sarmento
author_facet	Caon, Daniel Régis Sarmento
author_role	author
dc.contributor.advisor-co1.fl_str_mv	Andreão, Rodrigo Varejão
dc.contributor.advisor1.fl_str_mv	Rauber, Thomas Walter
dc.contributor.author.fl_str_mv	Caon, Daniel Régis Sarmento
dc.contributor.referee1.fl_str_mv	Varejão, Flávio Miguel
dc.contributor.referee2.fl_str_mv	Ynoguti, Carlos Alberto
contributor_str_mv	Andreão, Rodrigo Varejão Rauber, Thomas Walter Varejão, Flávio Miguel Ynoguti, Carlos Alberto
dc.subject.eng.fl_str_mv	Automatic speech recognition Hidden Markov models Acoustic modeling
topic	Automatic speech recognition Hidden Markov models Acoustic modeling HTK Julius K-Fold Processamento de sinais de fala Modelos ocultos de Markov Modelagem acústica Ciência da Computação Processamento de sinais Interfaces de usuário (Sistema de computador) Reconhecimento automático da voz Sistemas de reconhecimento de padrões 004
dc.subject.por.fl_str_mv	HTK Julius K-Fold Processamento de sinais de fala Modelos ocultos de Markov Modelagem acústica
dc.subject.cnpq.fl_str_mv	Ciência da Computação
dc.subject.br-rjbn.none.fl_str_mv	Processamento de sinais Interfaces de usuário (Sistema de computador) Reconhecimento automático da voz Sistemas de reconhecimento de padrões
dc.subject.udc.none.fl_str_mv	004
description	This work aims to provide automatic cognitive assistance via speech interface, to the elderly who live alone, at risk situation. Distress expressions and voice commands are part of the target vocabulary for speech recognition. Throughout the work, the large vocabulary continuous speech recognition system Julius is used in conjunction with the Hidden Markov Model Toolkit (HTK). The system Julius has its main features described, including its modification. This modification is part of the contribution which is in this work, including the detection of distress expressions ( situations of speech which suggest emergency). Four different languages were provided as target for recognition: French, Dutch, Spanish and English. In this same sequence of languages (determined by data availability and the local of scenarios for the integration of systems) theoretical studies and experiments were conducted to solve the need of working with each new configuration. This work includes studies of the French and Dutch languages. Initial experiments (in French) were made with adaptation of hidden Markov models and were analyzed by cross validation. In order to perform a new demonstration in Dutch, acoustic and language models were built and the system was integrated with other auxiliary modules (such as voice activity detector and the dialogue system). Results of speech recognition after acoustic adaptation to a specific speaker (and the creation of language models for a specific scenario to demonstrate the system) showed 86.39 % accuracy rate of sentence for the Dutch acoustic models. The same data shows 94.44 % semantical accuracy rate of sentence.
publishDate	2010
dc.date.issued.fl_str_mv	2010-08-27
dc.date.available.fl_str_mv	2011-03-23 2016-12-23T14:33:42Z
dc.date.accessioned.fl_str_mv	2016-12-23T14:33:42Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.citation.fl_str_mv	CAON, Daniel Régis Sarmento. Automatic speech recognition, with large vocabulary, robustness, independence of speaker and multilingual processing. 2010. 70 f. Dissertação (Mestrado em Informática) - Universidade Federal do Espírito Santo, Centro Tecnológico, Vitória, 2010.
dc.identifier.uri.fl_str_mv	http://repositorio.ufes.br/handle/10/6390
identifier_str_mv	CAON, Daniel Régis Sarmento. Automatic speech recognition, with large vocabulary, robustness, independence of speaker and multilingual processing. 2010. 70 f. Dissertação (Mestrado em Informática) - Universidade Federal do Espírito Santo, Centro Tecnológico, Vitória, 2010.
url	http://repositorio.ufes.br/handle/10/6390
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	Text
dc.publisher.none.fl_str_mv	Universidade Federal do Espírito Santo Mestrado em Informática
dc.publisher.program.fl_str_mv	Programa de Pós-Graduação em Informática
dc.publisher.initials.fl_str_mv	UFES
dc.publisher.country.fl_str_mv	BR
dc.publisher.department.fl_str_mv	Centro Tecnológico
publisher.none.fl_str_mv	Universidade Federal do Espírito Santo Mestrado em Informática
dc.source.none.fl_str_mv	reponame:Repositório Institucional da Universidade Federal do Espírito Santo (riUfes) instname:Universidade Federal do Espírito Santo (UFES) instacron:UFES
instname_str	Universidade Federal do Espírito Santo (UFES)
instacron_str	UFES
institution	UFES
reponame_str	Repositório Institucional da Universidade Federal do Espírito Santo (riUfes)
collection	Repositório Institucional da Universidade Federal do Espírito Santo (riUfes)
bitstream.url.fl_str_mv	http://repositorio.ufes.br/bitstreams/c0ea455c-a9de-425a-9609-f6346ea82bc8/download
bitstream.checksum.fl_str_mv	67b557539f4bc5b354bc90066e805215
bitstream.checksumAlgorithm.fl_str_mv	MD5
repository.name.fl_str_mv	Repositório Institucional da Universidade Federal do Espírito Santo (riUfes) - Universidade Federal do Espírito Santo (UFES)
repository.mail.fl_str_mv
_version_	1813022515129221120

Automatic speech recognition, with large vocabulary, robustness, independence of speaker and multilingual processing

Registros relacionados