Implementação de um sistema de reconhecimento automático de voz utilizando as técnicas MFCC e Quantização Vetorial com atributos dinâmicos, de normalização e detecção de voz ativa

Machado, Mateus Lichfett

Implementação de um sistema de reconhecimento automático de voz utilizando as técnicas MFCC e Quantização Vetorial com atributos dinâmicos, de normalização e detecção de voz ativa

Detalhes bibliográficos
Autor(a) principal:	Machado, Mateus Lichfett
Data de Publicação:	2016
Tipo de documento:	Dissertação
Idioma:	por
Título da fonte:	Repositório Institucional da UFU
Texto Completo:	https://repositorio.ufu.br/handle/123456789/20710 http://dx.doi.org/10.14393/ufu.di.2018.82
Resumo:	The present research investigates and elaborates an automatic and robust voice recognition based system using Mel Frequency Cepstral Coefficients (MFCC) as a technique for extracting the acoustic properties of speech signals and Vector Quantization (VQ) for classification and pattern recognition. Combined to these techniques it was added dynamic tools, normalization techniques and active voice detection in order to improve the system. Two dynamic coefficients were tested: Delta-Delta Coefficients (DDC) and Shifted Delta-Coefficients (SDC); as well as three different normalization techniques: Cepstral Mean and Variance Normalization (CMVN), Windowed Cepstral Mean and Variance Normalization (WCMVN), and Short-Time Gaussianization (STG); and also the Voice Activity Detection (VAD) tool, which was implemented according to the algorithm developed by Qiang He, combining the Short-Time Energy (STE) and Zero Crossing Rate (ZCR) methodologies. The research examines the ability of the designed system to operate according to a plurality of tasks: recognition of words or commands; speaker identification; and the combination of the two first tasks. In addition, the research investigates the best configuration of the system among the tested techniques for performing the tasks mentioned, analyzing its efficiency. Five experiments were conducted in a noise controlled environment, with the participation of eight persons. Four of them had their voices trained to create databases, and the others participated only in the test phase together with the ones that had trained the system. It was captured 144 speech samples for the experiments, 24 of them were used for building the database and the 120 others used during the test phase. To ensure the integrity of the experiments, the training and the testing samples were mirrored to be processed according to the configuration of each experiment. The use of these techniques was aprooved as tools capable of performing the tasks for which the system was proposed and the best configuration found was the combination of the MFCC and VQ techniques with VAD, Shifted-Delta Coefficients and the Short-Time Gaussianization normalization technique.

Metadados do item

id	UFU_40d823cbcae547915cc63e73f7222911
oai_identifier_str	oai:repositorio.ufu.br:123456789/20710
network_acronym_str	UFU
network_name_str	Repositório Institucional da UFU
repository_id_str
spelling	Implementação de um sistema de reconhecimento automático de voz utilizando as técnicas MFCC e Quantização Vetorial com atributos dinâmicos, de normalização e detecção de voz ativaImplementation of an automatic voice recognition system using MFCC and Vector Quantization techniques with dynamic, standardization and active voice detection attributesReconhecimento Automático de VozMel Frequency Cepstral CoefficientsQuantização VetorialAutomatic Speech RecognitionVector QuantizationEngenharia mecânicaReconhecimento automático da vozVoz - CodificaçãoSistemas de reconhecimento de padrõesCNPQ::ENGENHARIAS::ENGENHARIA MECANICAThe present research investigates and elaborates an automatic and robust voice recognition based system using Mel Frequency Cepstral Coefficients (MFCC) as a technique for extracting the acoustic properties of speech signals and Vector Quantization (VQ) for classification and pattern recognition. Combined to these techniques it was added dynamic tools, normalization techniques and active voice detection in order to improve the system. Two dynamic coefficients were tested: Delta-Delta Coefficients (DDC) and Shifted Delta-Coefficients (SDC); as well as three different normalization techniques: Cepstral Mean and Variance Normalization (CMVN), Windowed Cepstral Mean and Variance Normalization (WCMVN), and Short-Time Gaussianization (STG); and also the Voice Activity Detection (VAD) tool, which was implemented according to the algorithm developed by Qiang He, combining the Short-Time Energy (STE) and Zero Crossing Rate (ZCR) methodologies. The research examines the ability of the designed system to operate according to a plurality of tasks: recognition of words or commands; speaker identification; and the combination of the two first tasks. In addition, the research investigates the best configuration of the system among the tested techniques for performing the tasks mentioned, analyzing its efficiency. Five experiments were conducted in a noise controlled environment, with the participation of eight persons. Four of them had their voices trained to create databases, and the others participated only in the test phase together with the ones that had trained the system. It was captured 144 speech samples for the experiments, 24 of them were used for building the database and the 120 others used during the test phase. To ensure the integrity of the experiments, the training and the testing samples were mirrored to be processed according to the configuration of each experiment. The use of these techniques was aprooved as tools capable of performing the tasks for which the system was proposed and the best configuration found was the combination of the MFCC and VQ techniques with VAD, Shifted-Delta Coefficients and the Short-Time Gaussianization normalization technique.CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível SuperiorDissertação (Mestrado)A presente pesquisa pauta-se na investigação e elaboração de um sistema de reconhecimento automático de voz robusto. Para tanto, utilizou-se Mel Frequency Cepstral Coefficients (MFCC) como técnica para extração das propriedades acústicas de sinais de voz e Quantização Vetorial (VQ) para classificação e reconhecimento de padrões. Foram também incorporadas ferramentas dinâmicas, de normalização e detecção de voz ativa com intuito de aperfeiçoar o sistema. Testaram-se dois tipos de coeficientes dinâmicos: Delta-delta Coefficients (DDC) e Shifted-Delta Coefficients (SDC); três tipos de ferramentas para normalização dos vetores cepstrais: Cepstral Mean and Variance Normalization (CMVN), Windowed Cepstral Mean and Variance Normalization (WCMVN) e Short-Time Gaussianization (STG); além da técnica de detecção de voz ativa: Voice Activity Detection (VAD), que fora implementada segundo o algoritmo desenvolvido por Qiang He, combinando as metodologias Short-Time Energy (STE) e Zero Crossing Rate (ZCR). A pesquisa realizada analisa a capacidade do sistema desenvolvido em operar segundo uma pluralidade de tarefas: reconhecer palavras ou comandos; identificar o locutor; e a combinação das duas primeiras. Além disso, investigou-se qual a melhor combinação, dentre as técnicas e atributos em escopo, para realização das tarefas citadas analisando a eficiência do sistema. Foram realizados cinco experimentos em ambiente de ruído controlado, dos quais participaram oito indivíduos. Destes, quatro tiveram suas vozes treinadas para criação de bancos de dados, e os demais participaram da fase de testes com os primeiros. Foram captadas ao total 144 amostras para realização do experimento. Destas, 24 foram utilizadas para construção de bancos de dados e 120 utilizadas durante a fase de testes. Para garantir a integridade dos experimentos, as amostras de treinamento e testes foram espelhadas para serem processadas segundo a configuração de cada experimento. Os resultados obtidos aprovaram o uso destas técnicas como ferramentas aptas à execução das tarefas para o qual o sistema fora proposto e apontaram a melhor configuração como combinação das MFCC e VQ, os atributos VAD, Shifted-Delta Coefficients e a ferramenta de normalização Short-Time Gaussianization.Universidade Federal de UberlândiaBrasilPrograma de Pós-graduação em Engenharia MecânicaDuarte, Marcus Antônio VianaTeodoro, Elias BitencourtNetto, Sergio LimaMachado, Mateus Lichfett2018-02-20T18:48:19Z2018-02-20T18:48:19Z2016-04-18info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfMACHADO, Mateus Lichfett. Implementação de um sistema de reconhecimento automático de voz utilizando as técnicas MFCC e Quantização Vetorial com atributos dinâmicos, de normalização e detecção de voz ativa. 2016. 149 f. Dissertação (Mestrado em Engenharia Mecânica) - Universidade Federal de Uberlândia, Uberlândia, 2016.https://repositorio.ufu.br/handle/123456789/20710http://dx.doi.org/10.14393/ufu.di.2018.82porinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFUinstname:Universidade Federal de Uberlândia (UFU)instacron:UFU2021-09-16T20:20:50Zoai:repositorio.ufu.br:123456789/20710Repositório InstitucionalONGhttp://repositorio.ufu.br/oai/requestdiinf@dirbi.ufu.bropendoar:2021-09-16T20:20:50Repositório Institucional da UFU - Universidade Federal de Uberlândia (UFU)false
dc.title.none.fl_str_mv	Implementação de um sistema de reconhecimento automático de voz utilizando as técnicas MFCC e Quantização Vetorial com atributos dinâmicos, de normalização e detecção de voz ativa Implementation of an automatic voice recognition system using MFCC and Vector Quantization techniques with dynamic, standardization and active voice detection attributes
title	Implementação de um sistema de reconhecimento automático de voz utilizando as técnicas MFCC e Quantização Vetorial com atributos dinâmicos, de normalização e detecção de voz ativa
spellingShingle	Implementação de um sistema de reconhecimento automático de voz utilizando as técnicas MFCC e Quantização Vetorial com atributos dinâmicos, de normalização e detecção de voz ativa Machado, Mateus Lichfett Reconhecimento Automático de Voz Mel Frequency Cepstral Coefficients Quantização Vetorial Automatic Speech Recognition Vector Quantization Engenharia mecânica Reconhecimento automático da voz Voz - Codificação Sistemas de reconhecimento de padrões CNPQ::ENGENHARIAS::ENGENHARIA MECANICA
title_short	Implementação de um sistema de reconhecimento automático de voz utilizando as técnicas MFCC e Quantização Vetorial com atributos dinâmicos, de normalização e detecção de voz ativa
title_full	Implementação de um sistema de reconhecimento automático de voz utilizando as técnicas MFCC e Quantização Vetorial com atributos dinâmicos, de normalização e detecção de voz ativa
title_fullStr	Implementação de um sistema de reconhecimento automático de voz utilizando as técnicas MFCC e Quantização Vetorial com atributos dinâmicos, de normalização e detecção de voz ativa
title_full_unstemmed	Implementação de um sistema de reconhecimento automático de voz utilizando as técnicas MFCC e Quantização Vetorial com atributos dinâmicos, de normalização e detecção de voz ativa
title_sort	Implementação de um sistema de reconhecimento automático de voz utilizando as técnicas MFCC e Quantização Vetorial com atributos dinâmicos, de normalização e detecção de voz ativa
author	Machado, Mateus Lichfett
author_facet	Machado, Mateus Lichfett
author_role	author
dc.contributor.none.fl_str_mv	Duarte, Marcus Antônio Viana Teodoro, Elias Bitencourt Netto, Sergio Lima
dc.contributor.author.fl_str_mv	Machado, Mateus Lichfett
dc.subject.por.fl_str_mv	Reconhecimento Automático de Voz Mel Frequency Cepstral Coefficients Quantização Vetorial Automatic Speech Recognition Vector Quantization Engenharia mecânica Reconhecimento automático da voz Voz - Codificação Sistemas de reconhecimento de padrões CNPQ::ENGENHARIAS::ENGENHARIA MECANICA
topic	Reconhecimento Automático de Voz Mel Frequency Cepstral Coefficients Quantização Vetorial Automatic Speech Recognition Vector Quantization Engenharia mecânica Reconhecimento automático da voz Voz - Codificação Sistemas de reconhecimento de padrões CNPQ::ENGENHARIAS::ENGENHARIA MECANICA
description	The present research investigates and elaborates an automatic and robust voice recognition based system using Mel Frequency Cepstral Coefficients (MFCC) as a technique for extracting the acoustic properties of speech signals and Vector Quantization (VQ) for classification and pattern recognition. Combined to these techniques it was added dynamic tools, normalization techniques and active voice detection in order to improve the system. Two dynamic coefficients were tested: Delta-Delta Coefficients (DDC) and Shifted Delta-Coefficients (SDC); as well as three different normalization techniques: Cepstral Mean and Variance Normalization (CMVN), Windowed Cepstral Mean and Variance Normalization (WCMVN), and Short-Time Gaussianization (STG); and also the Voice Activity Detection (VAD) tool, which was implemented according to the algorithm developed by Qiang He, combining the Short-Time Energy (STE) and Zero Crossing Rate (ZCR) methodologies. The research examines the ability of the designed system to operate according to a plurality of tasks: recognition of words or commands; speaker identification; and the combination of the two first tasks. In addition, the research investigates the best configuration of the system among the tested techniques for performing the tasks mentioned, analyzing its efficiency. Five experiments were conducted in a noise controlled environment, with the participation of eight persons. Four of them had their voices trained to create databases, and the others participated only in the test phase together with the ones that had trained the system. It was captured 144 speech samples for the experiments, 24 of them were used for building the database and the 120 others used during the test phase. To ensure the integrity of the experiments, the training and the testing samples were mirrored to be processed according to the configuration of each experiment. The use of these techniques was aprooved as tools capable of performing the tasks for which the system was proposed and the best configuration found was the combination of the MFCC and VQ techniques with VAD, Shifted-Delta Coefficients and the Short-Time Gaussianization normalization technique.
publishDate	2016
dc.date.none.fl_str_mv	2016-04-18 2018-02-20T18:48:19Z 2018-02-20T18:48:19Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	MACHADO, Mateus Lichfett. Implementação de um sistema de reconhecimento automático de voz utilizando as técnicas MFCC e Quantização Vetorial com atributos dinâmicos, de normalização e detecção de voz ativa. 2016. 149 f. Dissertação (Mestrado em Engenharia Mecânica) - Universidade Federal de Uberlândia, Uberlândia, 2016. https://repositorio.ufu.br/handle/123456789/20710 http://dx.doi.org/10.14393/ufu.di.2018.82
identifier_str_mv	MACHADO, Mateus Lichfett. Implementação de um sistema de reconhecimento automático de voz utilizando as técnicas MFCC e Quantização Vetorial com atributos dinâmicos, de normalização e detecção de voz ativa. 2016. 149 f. Dissertação (Mestrado em Engenharia Mecânica) - Universidade Federal de Uberlândia, Uberlândia, 2016.
url	https://repositorio.ufu.br/handle/123456789/20710 http://dx.doi.org/10.14393/ufu.di.2018.82
dc.language.iso.fl_str_mv	por
language	por
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Universidade Federal de Uberlândia Brasil Programa de Pós-graduação em Engenharia Mecânica
publisher.none.fl_str_mv	Universidade Federal de Uberlândia Brasil Programa de Pós-graduação em Engenharia Mecânica
dc.source.none.fl_str_mv	reponame:Repositório Institucional da UFU instname:Universidade Federal de Uberlândia (UFU) instacron:UFU
instname_str	Universidade Federal de Uberlândia (UFU)
instacron_str	UFU
institution	UFU
reponame_str	Repositório Institucional da UFU
collection	Repositório Institucional da UFU
repository.name.fl_str_mv	Repositório Institucional da UFU - Universidade Federal de Uberlândia (UFU)
repository.mail.fl_str_mv	diinf@dirbi.ufu.br
_version_	1813711598141833216

Implementação de um sistema de reconhecimento automático de voz utilizando as técnicas MFCC e Quantização Vetorial com atributos dinâmicos, de normalização e detecção de voz ativa

Registros relacionados