Técnicas para conversão de orador em sinais de voz

Costa, Victor Pereira da

Técnicas para conversão de orador em sinais de voz

Detalhes bibliográficos
Autor(a) principal:	Costa, Victor Pereira da
Data de Publicação:	2017
Tipo de documento:	Dissertação
Idioma:	por
Título da fonte:	Repositório Institucional da UFRJ
Texto Completo:	http://hdl.handle.net/11422/6206
Resumo:	Presents a voice conversion system, a system that transforms a voice signal spoken by some speaker into a signal that sounds like it was spoken by another speaker, without changing the textual content of the speech or changing information like emotion or emphasis. The main objective of this work is to compare the conversion as done by different methods. To accomplish this, a unified voice conversion system containing the analysis, conversion and synthesis steps necessary to transform the speaker was implemented. Four voice conversion techniques, three from the literature, based on Gaussian mixture models, hidden Markov models and feed forward neural networks, and one novel based on recurrent neural networks, were evaluated. Two methods to generate the excitation used in the synthesis step were also implemented, one utilizing a parametric pulse trained on the speech signals, and one utilizing the PSOLA algorithm. On this system a couple of experiments were conducted to assess the conversion quality of each method: one measuring the distance between the cepstra of the signals, and the other employing a speaker recognition system. In these experiments the conversion based on Gaussian mixture models yielded the best results, but all techniques were relatively close in terms of performance.

Metadados do item

id	UFRJ_4f541e276fa05dc81914ab4d786b1968
oai_identifier_str	oai:pantheon.ufrj.br:11422/6206
network_acronym_str	UFRJ
network_name_str	Repositório Institucional da UFRJ
repository_id_str
spelling	Técnicas para conversão de orador em sinais de vozProcessamento digital de vozProcessamento de sinaisReconhecimento de vozCNPQ::ENGENHARIAS::ENGENHARIA ELETRICA::MEDIDAS ELETRICAS, MAGNETICAS E ELETRONICAS INSTRUMENTACAO::INSTRUMENTACAO ELETRONICAPresents a voice conversion system, a system that transforms a voice signal spoken by some speaker into a signal that sounds like it was spoken by another speaker, without changing the textual content of the speech or changing information like emotion or emphasis. The main objective of this work is to compare the conversion as done by different methods. To accomplish this, a unified voice conversion system containing the analysis, conversion and synthesis steps necessary to transform the speaker was implemented. Four voice conversion techniques, three from the literature, based on Gaussian mixture models, hidden Markov models and feed forward neural networks, and one novel based on recurrent neural networks, were evaluated. Two methods to generate the excitation used in the synthesis step were also implemented, one utilizing a parametric pulse trained on the speech signals, and one utilizing the PSOLA algorithm. On this system a couple of experiments were conducted to assess the conversion quality of each method: one measuring the distance between the cepstra of the signals, and the other employing a speaker recognition system. In these experiments the conversion based on Gaussian mixture models yielded the best results, but all techniques were relatively close in terms of performance.Apresenta um sistema de conversão de falante, um sistema que possa transformar um sinal de fala dito por um falante em um sinal que pareça ter sido dito por outro falante, sem alterar o que é dito nem características como emoção ou ênfase. O objetivo principal é a comparação do desempenho de diferentes técnicas para a realização da conversão. Para isso foi implementado um sistema unificado que realiza as etapas de análise, conversão e síntese necessárias para a transformação do falante. Foram avaliadas quatro técnicas de conversão: três da literatura, baseadas em modelos de misturas gaussianas, modelos ocultos de Markov e redes neurais feed-foward; e uma nova, baseada em redes neurais recorrentes. Além disso, também foram implementadas duas técnicas para gerar a excitação na síntese, uma utilizando um pulso paramétrico treinado a partir os sinais de fala e uma utilizando o algoritmo PSOLA. Sobre esse sistema foram realizados dois experimentos para medir a qualidade da conversão, um utilizando como métrica a distância entre os cepstra dos sinais e um utilizando um sistema de identificação de falante. Os testes mostraram que o método baseado em modelo de misturas gaussianas obteve melhores resultados, mas todos os métodos possuem desempenho próximo.Universidade Federal do Rio de JaneiroBrasilInstituto Alberto Luiz Coimbra de Pós-Graduação e Pesquisa de EngenhariaPrograma de Pós-Graduação em Engenharia ElétricaUFRJBiscainho, Luiz Wagner Pereirahttp://lattes.cnpq.br/3798063417184939Lima Netto, SergioLima, Amaro Azevedo deCosta, Victor Pereira da2019-01-22T13:29:55Z2023-12-21T03:05:44Z2017-03info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesishttp://hdl.handle.net/11422/6206porinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFRJinstname:Universidade Federal do Rio de Janeiro (UFRJ)instacron:UFRJ2023-12-21T03:05:44Zoai:pantheon.ufrj.br:11422/6206Repositório InstitucionalPUBhttp://www.pantheon.ufrj.br/oai/requestpantheon@sibi.ufrj.bropendoar:2023-12-21T03:05:44Repositório Institucional da UFRJ - Universidade Federal do Rio de Janeiro (UFRJ)false
dc.title.none.fl_str_mv	Técnicas para conversão de orador em sinais de voz
title	Técnicas para conversão de orador em sinais de voz
spellingShingle	Técnicas para conversão de orador em sinais de voz Costa, Victor Pereira da Processamento digital de voz Processamento de sinais Reconhecimento de voz CNPQ::ENGENHARIAS::ENGENHARIA ELETRICA::MEDIDAS ELETRICAS, MAGNETICAS E ELETRONICAS INSTRUMENTACAO::INSTRUMENTACAO ELETRONICA
title_short	Técnicas para conversão de orador em sinais de voz
title_full	Técnicas para conversão de orador em sinais de voz
title_fullStr	Técnicas para conversão de orador em sinais de voz
title_full_unstemmed	Técnicas para conversão de orador em sinais de voz
title_sort	Técnicas para conversão de orador em sinais de voz
author	Costa, Victor Pereira da
author_facet	Costa, Victor Pereira da
author_role	author
dc.contributor.none.fl_str_mv	Biscainho, Luiz Wagner Pereira http://lattes.cnpq.br/3798063417184939 Lima Netto, Sergio Lima, Amaro Azevedo de
dc.contributor.author.fl_str_mv	Costa, Victor Pereira da
dc.subject.por.fl_str_mv	Processamento digital de voz Processamento de sinais Reconhecimento de voz CNPQ::ENGENHARIAS::ENGENHARIA ELETRICA::MEDIDAS ELETRICAS, MAGNETICAS E ELETRONICAS INSTRUMENTACAO::INSTRUMENTACAO ELETRONICA
topic	Processamento digital de voz Processamento de sinais Reconhecimento de voz CNPQ::ENGENHARIAS::ENGENHARIA ELETRICA::MEDIDAS ELETRICAS, MAGNETICAS E ELETRONICAS INSTRUMENTACAO::INSTRUMENTACAO ELETRONICA
description	Presents a voice conversion system, a system that transforms a voice signal spoken by some speaker into a signal that sounds like it was spoken by another speaker, without changing the textual content of the speech or changing information like emotion or emphasis. The main objective of this work is to compare the conversion as done by different methods. To accomplish this, a unified voice conversion system containing the analysis, conversion and synthesis steps necessary to transform the speaker was implemented. Four voice conversion techniques, three from the literature, based on Gaussian mixture models, hidden Markov models and feed forward neural networks, and one novel based on recurrent neural networks, were evaluated. Two methods to generate the excitation used in the synthesis step were also implemented, one utilizing a parametric pulse trained on the speech signals, and one utilizing the PSOLA algorithm. On this system a couple of experiments were conducted to assess the conversion quality of each method: one measuring the distance between the cepstra of the signals, and the other employing a speaker recognition system. In these experiments the conversion based on Gaussian mixture models yielded the best results, but all techniques were relatively close in terms of performance.
publishDate	2017
dc.date.none.fl_str_mv	2017-03 2019-01-22T13:29:55Z 2023-12-21T03:05:44Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/11422/6206
url	http://hdl.handle.net/11422/6206
dc.language.iso.fl_str_mv	por
language	por
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.publisher.none.fl_str_mv	Universidade Federal do Rio de Janeiro Brasil Instituto Alberto Luiz Coimbra de Pós-Graduação e Pesquisa de Engenharia Programa de Pós-Graduação em Engenharia Elétrica UFRJ
publisher.none.fl_str_mv	Universidade Federal do Rio de Janeiro Brasil Instituto Alberto Luiz Coimbra de Pós-Graduação e Pesquisa de Engenharia Programa de Pós-Graduação em Engenharia Elétrica UFRJ
dc.source.none.fl_str_mv	reponame:Repositório Institucional da UFRJ instname:Universidade Federal do Rio de Janeiro (UFRJ) instacron:UFRJ
instname_str	Universidade Federal do Rio de Janeiro (UFRJ)
instacron_str	UFRJ
institution	UFRJ
reponame_str	Repositório Institucional da UFRJ
collection	Repositório Institucional da UFRJ
repository.name.fl_str_mv	Repositório Institucional da UFRJ - Universidade Federal do Rio de Janeiro (UFRJ)
repository.mail.fl_str_mv	pantheon@sibi.ufrj.br
_version_	1815455980724420608

Técnicas para conversão de orador em sinais de voz

Registros relacionados