A novel word boundary detector based on the teager energy operator for automatic speech recognition
Autor(a) principal: | |
---|---|
Data de Publicação: | 2010 |
Tipo de documento: | Dissertação |
Idioma: | por |
Título da fonte: | Repositório Institucional da UFU |
Texto Completo: | https://repositorio.ufu.br/handle/123456789/14446 |
Resumo: | This work is part of a major research project and contributes into the development of a speaker-independent speech recognition system for isolated words from a limited vocabulary. It proposes a novel spoken word boundary detection method named TEO-based method for Spoken Word Segmentation (TSWS). Based on the Teager Energy Operator (TEO), the TSWS is presented and compared with two widely used speech segmentation methods: Classical , that uses energy and zero-crossing rate computations, and Bottom-up , based on the concepts of adaptive level equalization, energy pulse detection and endpoint ordering. The TSWS shows a great precision improvement on spoken word boundary detection when compared to Classical (67.8% of error reduction) and Bottom-up (61.2% of error reduction) methods. A complete isolated spoken word recognition system (ISWRS) is also presented. This ISWRS uses Mel-frequency Cepstral Coefficients (MFCC) as the parametric representation of the speech signal, and a standard multilayer feed-forward network (MLP) as the recognizer. Two sets of tests were conducted, one with a database of 50 different words with a total of 10,350 utterances, and another with a smaller vocabulary 17 words with a total of 3,519 utterances. Two in three of those utterances constituted the training set for the ISWRS, and one in three, the testing set. The tests were conducted for each of the TSWS, Classical or Bottom-up methods, used in the ISWRS speech segmentation stage. TSWS has enabled the ISWRS to achieve 99.0% of success on generalization tests, against 98.6% for Classical and Bottom-up methods. After, a white Gaussian noise was artificially added to ISWRS inputs to reach a signal-to-noise ratio of 15dB. The noise presence alters the ISWRS performances to 96.5%, 93.6%, and 91.4% on generalization tests when using TSWS, Classical and Bottom-up methods, respectively. |
id |
UFU_bf183fc5c2e488fb8d674f0a6b21e7a4 |
---|---|
oai_identifier_str |
oai:repositorio.ufu.br:123456789/14446 |
network_acronym_str |
UFU |
network_name_str |
Repositório Institucional da UFU |
repository_id_str |
|
spelling |
A novel word boundary detector based on the teager energy operator for automatic speech recognitionSegmentação da falaDetecção de fronteiras de palavra faladaTEOIndependente de locutorPalavras isoladasSistema de reconhecimento de vozMFCCMLPReconhecimento automático da vozRedes neurais artificiaisSpeech segmentationSpoken word boundary detectionSpeaker-independentIsolated wordsSpeech recognition systemMel-frequency cepstral coefficientsArtificial neural networkCNPQ::ENGENHARIAS::ENGENHARIA ELETRICAThis work is part of a major research project and contributes into the development of a speaker-independent speech recognition system for isolated words from a limited vocabulary. It proposes a novel spoken word boundary detection method named TEO-based method for Spoken Word Segmentation (TSWS). Based on the Teager Energy Operator (TEO), the TSWS is presented and compared with two widely used speech segmentation methods: Classical , that uses energy and zero-crossing rate computations, and Bottom-up , based on the concepts of adaptive level equalization, energy pulse detection and endpoint ordering. The TSWS shows a great precision improvement on spoken word boundary detection when compared to Classical (67.8% of error reduction) and Bottom-up (61.2% of error reduction) methods. A complete isolated spoken word recognition system (ISWRS) is also presented. This ISWRS uses Mel-frequency Cepstral Coefficients (MFCC) as the parametric representation of the speech signal, and a standard multilayer feed-forward network (MLP) as the recognizer. Two sets of tests were conducted, one with a database of 50 different words with a total of 10,350 utterances, and another with a smaller vocabulary 17 words with a total of 3,519 utterances. Two in three of those utterances constituted the training set for the ISWRS, and one in three, the testing set. The tests were conducted for each of the TSWS, Classical or Bottom-up methods, used in the ISWRS speech segmentation stage. TSWS has enabled the ISWRS to achieve 99.0% of success on generalization tests, against 98.6% for Classical and Bottom-up methods. After, a white Gaussian noise was artificially added to ISWRS inputs to reach a signal-to-noise ratio of 15dB. The noise presence alters the ISWRS performances to 96.5%, 93.6%, and 91.4% on generalization tests when using TSWS, Classical and Bottom-up methods, respectively.Coordenação de Aperfeiçoamento de Pessoal de Nível SuperiorMestre em CiênciasEste trabalho é parte integrante de um projeto de pesquisa maior e contribui no desenvolvimento de um sistema de reconhecimento de voz independente de locutor para palavras isoladas, a partir de um vocabulário limitado. O presente trabalho propõe um novo método de detecção de fronteiras da palavra falada chamado Método baseado em TEO para Isolamento de Palavra Falada (TSWS). Baseado no Operador de Energia de Teager (TEO), o TSWS é apresentado e comparado com dois métodos de segmentação da fala amplamente utilizados: o método Clássico , que usa cálculos de energia e taxa de cruzamento por zero, e o método Bottom-up , baseado em conceitos de equalização de níveis adaptativos, detecção de pulsos de energia e ordenação de limites. O TSWS apresenta um aumento na precisão na detecção de limites da palavra falada quando comparado aos métodos Clássico (redução para 67,8% do erro) e Bottom-up (redução para 61,2% do erro). Um sistema completo de reconhecimento de palavras faladas isoladas (SRPFI) também é apresentado. Este SRPFI utiliza coeficientes de Mel- Cepstrum (MFCC) como representação paramétrica do sinal de fala e uma rede feed-forward multicamada padrão (MLP) como reconhecedor. Dois conjuntos de testes foram conduzidos, um com um banco de dados de 50 palavras diferentes com o total de 10.350 pronúncias, e outro com um vocabulário menor 17 palavras com o total de 3.519 pronúncias. Duas em cada três dessas pronúncias constituem o conjunto para treinamento para o SRPFI, e uma em cada três, o conjunto para testes. Os testes foram conduzidos para cada um dos métodos TSWS, Clássico ou Bottom-up, utilizados na fase de segmentação da fala do SRPFI. O TSWS permitiu com que o SRPFI atingisse 99,0% de sucesso em testes de generalização, contra 98,6% para os métodos Clássico e Bottom-up. Em seguida, foi artificialmente adicionado ruído branco gaussiano às entradas do SRPFI para atingir uma relação sinal/ruído de 15dB. A presença do ruído alterou a performance do SRPFI para 96,5%, 93,6% e 91,4% em testes de generalização bem sucedidos quando utilizados os métodos TSWS, Clássico e Bottom-up, respectivamente.Universidade Federal de UberlândiaBRPrograma de Pós-graduação em Engenharia ElétricaEngenhariasUFUYamanaka, Keijihttp://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4798494D8Nomura, Shigueohttp://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4723707A0Carrijo, Gilberto Aranteshttp://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4781864Y0Yehia, Hani Camillehttp://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4785031D7Peretta, Igor Santos2016-06-22T18:38:39Z2011-03-222016-06-22T18:38:39Z2010-12-21info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfapplication/pdfPERETTA, Igor Santos. A novel word boundary detector based on the teager energy operator for automatic speech recognition. 2010. 125 f. Dissertação (Mestrado em Engenharias) - Universidade Federal de Uberlândia, Uberlândia, 2010.https://repositorio.ufu.br/handle/123456789/14446porinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFUinstname:Universidade Federal de Uberlândia (UFU)instacron:UFU2016-06-23T06:56:00Zoai:repositorio.ufu.br:123456789/14446Repositório InstitucionalONGhttp://repositorio.ufu.br/oai/requestdiinf@dirbi.ufu.bropendoar:2016-06-23T06:56Repositório Institucional da UFU - Universidade Federal de Uberlândia (UFU)false |
dc.title.none.fl_str_mv |
A novel word boundary detector based on the teager energy operator for automatic speech recognition |
title |
A novel word boundary detector based on the teager energy operator for automatic speech recognition |
spellingShingle |
A novel word boundary detector based on the teager energy operator for automatic speech recognition Peretta, Igor Santos Segmentação da fala Detecção de fronteiras de palavra falada TEO Independente de locutor Palavras isoladas Sistema de reconhecimento de voz MFCC MLP Reconhecimento automático da voz Redes neurais artificiais Speech segmentation Spoken word boundary detection Speaker-independent Isolated words Speech recognition system Mel-frequency cepstral coefficients Artificial neural network CNPQ::ENGENHARIAS::ENGENHARIA ELETRICA |
title_short |
A novel word boundary detector based on the teager energy operator for automatic speech recognition |
title_full |
A novel word boundary detector based on the teager energy operator for automatic speech recognition |
title_fullStr |
A novel word boundary detector based on the teager energy operator for automatic speech recognition |
title_full_unstemmed |
A novel word boundary detector based on the teager energy operator for automatic speech recognition |
title_sort |
A novel word boundary detector based on the teager energy operator for automatic speech recognition |
author |
Peretta, Igor Santos |
author_facet |
Peretta, Igor Santos |
author_role |
author |
dc.contributor.none.fl_str_mv |
Yamanaka, Keiji http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4798494D8 Nomura, Shigueo http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4723707A0 Carrijo, Gilberto Arantes http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4781864Y0 Yehia, Hani Camille http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4785031D7 |
dc.contributor.author.fl_str_mv |
Peretta, Igor Santos |
dc.subject.por.fl_str_mv |
Segmentação da fala Detecção de fronteiras de palavra falada TEO Independente de locutor Palavras isoladas Sistema de reconhecimento de voz MFCC MLP Reconhecimento automático da voz Redes neurais artificiais Speech segmentation Spoken word boundary detection Speaker-independent Isolated words Speech recognition system Mel-frequency cepstral coefficients Artificial neural network CNPQ::ENGENHARIAS::ENGENHARIA ELETRICA |
topic |
Segmentação da fala Detecção de fronteiras de palavra falada TEO Independente de locutor Palavras isoladas Sistema de reconhecimento de voz MFCC MLP Reconhecimento automático da voz Redes neurais artificiais Speech segmentation Spoken word boundary detection Speaker-independent Isolated words Speech recognition system Mel-frequency cepstral coefficients Artificial neural network CNPQ::ENGENHARIAS::ENGENHARIA ELETRICA |
description |
This work is part of a major research project and contributes into the development of a speaker-independent speech recognition system for isolated words from a limited vocabulary. It proposes a novel spoken word boundary detection method named TEO-based method for Spoken Word Segmentation (TSWS). Based on the Teager Energy Operator (TEO), the TSWS is presented and compared with two widely used speech segmentation methods: Classical , that uses energy and zero-crossing rate computations, and Bottom-up , based on the concepts of adaptive level equalization, energy pulse detection and endpoint ordering. The TSWS shows a great precision improvement on spoken word boundary detection when compared to Classical (67.8% of error reduction) and Bottom-up (61.2% of error reduction) methods. A complete isolated spoken word recognition system (ISWRS) is also presented. This ISWRS uses Mel-frequency Cepstral Coefficients (MFCC) as the parametric representation of the speech signal, and a standard multilayer feed-forward network (MLP) as the recognizer. Two sets of tests were conducted, one with a database of 50 different words with a total of 10,350 utterances, and another with a smaller vocabulary 17 words with a total of 3,519 utterances. Two in three of those utterances constituted the training set for the ISWRS, and one in three, the testing set. The tests were conducted for each of the TSWS, Classical or Bottom-up methods, used in the ISWRS speech segmentation stage. TSWS has enabled the ISWRS to achieve 99.0% of success on generalization tests, against 98.6% for Classical and Bottom-up methods. After, a white Gaussian noise was artificially added to ISWRS inputs to reach a signal-to-noise ratio of 15dB. The noise presence alters the ISWRS performances to 96.5%, 93.6%, and 91.4% on generalization tests when using TSWS, Classical and Bottom-up methods, respectively. |
publishDate |
2010 |
dc.date.none.fl_str_mv |
2010-12-21 2011-03-22 2016-06-22T18:38:39Z 2016-06-22T18:38:39Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
PERETTA, Igor Santos. A novel word boundary detector based on the teager energy operator for automatic speech recognition. 2010. 125 f. Dissertação (Mestrado em Engenharias) - Universidade Federal de Uberlândia, Uberlândia, 2010. https://repositorio.ufu.br/handle/123456789/14446 |
identifier_str_mv |
PERETTA, Igor Santos. A novel word boundary detector based on the teager energy operator for automatic speech recognition. 2010. 125 f. Dissertação (Mestrado em Engenharias) - Universidade Federal de Uberlândia, Uberlândia, 2010. |
url |
https://repositorio.ufu.br/handle/123456789/14446 |
dc.language.iso.fl_str_mv |
por |
language |
por |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf application/pdf |
dc.publisher.none.fl_str_mv |
Universidade Federal de Uberlândia BR Programa de Pós-graduação em Engenharia Elétrica Engenharias UFU |
publisher.none.fl_str_mv |
Universidade Federal de Uberlândia BR Programa de Pós-graduação em Engenharia Elétrica Engenharias UFU |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFU instname:Universidade Federal de Uberlândia (UFU) instacron:UFU |
instname_str |
Universidade Federal de Uberlândia (UFU) |
instacron_str |
UFU |
institution |
UFU |
reponame_str |
Repositório Institucional da UFU |
collection |
Repositório Institucional da UFU |
repository.name.fl_str_mv |
Repositório Institucional da UFU - Universidade Federal de Uberlândia (UFU) |
repository.mail.fl_str_mv |
diinf@dirbi.ufu.br |
_version_ |
1813711527689060352 |