Differentiable Measures for Speech Spectral Modeling

Arjona Ramírez, Miguel; Beccaro, Wesley; Rodríguez, Demóstenes Zegarra; Rosa, Renata Lopes

Differentiable Measures for Speech Spectral Modeling

Detalhes bibliográficos
Autor(a) principal:	Arjona Ramírez, Miguel
Data de Publicação:	2022
Outros Autores:	Beccaro, Wesley, Rodríguez, Demóstenes Zegarra, Rosa, Renata Lopes
Tipo de documento:	Artigo
Idioma:	eng
Título da fonte:	Repositório Institucional da UFLA
Texto Completo:	http://repositorio.ufla.br/jspui/handle/1/50638
Resumo:	Autoregressive models for the envelope of speech power spectral densities (PSDs) are refined by the self-supervised spectral learning machine (S3LM) provided with differentiable spectral objective functions, including the Itakura-Saito divergence (ISD), the Kullback-Leibler divergence (KLD), the reverse KLD (RKLD) and the log spectral distortion (LSD), which display more significant results. However, in order to assess the models more perceptually, a method is proposed based upon perturbations around perfect reconstruction analysis-synthesis configurations. In the cross-excitation analysis-synthesis assessment (CEASA) method, the residual signals generated by analysis filters of the spectral models are injected as excitation into the synthesis filters derived from the same and other models in order to be evaluated by the perceptual evaluation of speech quality (PESQ) and Itakura divergence (ID), which are averaged over a set of models obtained using the objective functions mentioned above. The results lead to a superior performance when the RKLD is used as the loss function for the estimation of the spectral models with the ISD ranking close behind. The focus of these divergences on the spectral peaks is argued and pointed as the most important factor for this behavior. Specifically, using the PESQ scores obtained with CEASA, the RKLD loss is found to improve the performance by 1.0%, 4.0% and 19.3% with respect to the open-loop analysis, the KLD and the LSD models, respectively, while the corresponding improvements for the ISD loss are 0.1%, 3.0% and 18.2%, and the RKLD models excel the ISD models by 1.0% on average. Even though the spectral measures alone are not able to unequivocally distinguish the better of the two, CEASA is shown to have enough sensitivity to distinguish their performances. In summary, the learning machine S3LM fits models for the short-term spectral envelope of speech and, for the evaluation of its performance under several differentiable loss...

Metadados do item

id	UFLA_60d6c80f6569284d3f17d2b41882207e
oai_identifier_str	oai:localhost:1/50638
network_acronym_str	UFLA
network_name_str	Repositório Institucional da UFLA
repository_id_str
spelling	Differentiable Measures for Speech Spectral ModelingAutoregressive processesMachine learning algorithmsPrediction methodsSelfsupervised learningSpeech analysisSpectral analysisProcessos autorregressivosAlgoritmos de aprendizagem de máquinasMétodos de previsãoAprendizado autossupervisionadoAnálise de discursoAnálise espectralAutoregressive models for the envelope of speech power spectral densities (PSDs) are refined by the self-supervised spectral learning machine (S3LM) provided with differentiable spectral objective functions, including the Itakura-Saito divergence (ISD), the Kullback-Leibler divergence (KLD), the reverse KLD (RKLD) and the log spectral distortion (LSD), which display more significant results. However, in order to assess the models more perceptually, a method is proposed based upon perturbations around perfect reconstruction analysis-synthesis configurations. In the cross-excitation analysis-synthesis assessment (CEASA) method, the residual signals generated by analysis filters of the spectral models are injected as excitation into the synthesis filters derived from the same and other models in order to be evaluated by the perceptual evaluation of speech quality (PESQ) and Itakura divergence (ID), which are averaged over a set of models obtained using the objective functions mentioned above. The results lead to a superior performance when the RKLD is used as the loss function for the estimation of the spectral models with the ISD ranking close behind. The focus of these divergences on the spectral peaks is argued and pointed as the most important factor for this behavior. Specifically, using the PESQ scores obtained with CEASA, the RKLD loss is found to improve the performance by 1.0%, 4.0% and 19.3% with respect to the open-loop analysis, the KLD and the LSD models, respectively, while the corresponding improvements for the ISD loss are 0.1%, 3.0% and 18.2%, and the RKLD models excel the ISD models by 1.0% on average. Even though the spectral measures alone are not able to unequivocally distinguish the better of the two, CEASA is shown to have enough sensitivity to distinguish their performances. In summary, the learning machine S3LM fits models for the short-term spectral envelope of speech and, for the evaluation of its performance under several differentiable loss...Institute of Electrical and Electronics Engineers (IEEE)2022-07-18T20:53:34Z2022-07-18T20:53:34Z2022-02info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfARJONA RAMÍREZ, M. et al. Differentiable Measures for Speech Spectral Modeling. IEEE Access, [S.I.], v. 10, p. 17609-17618, 2022. DOI: 10.1109/ACCESS.2022.3150728.http://repositorio.ufla.br/jspui/handle/1/50638IEEE Accessreponame:Repositório Institucional da UFLAinstname:Universidade Federal de Lavras (UFLA)instacron:UFLAhttp://creativecommons.org/licenses/by/4.0/info:eu-repo/semantics/openAccessArjona Ramírez, MiguelBeccaro, WesleyRodríguez, Demóstenes ZegarraRosa, Renata Lopeseng2023-05-03T13:09:07Zoai:localhost:1/50638Repositório InstitucionalPUBhttp://repositorio.ufla.br/oai/requestnivaldo@ufla.br \|\| repositorio.biblioteca@ufla.bropendoar:2023-05-03T13:09:07Repositório Institucional da UFLA - Universidade Federal de Lavras (UFLA)false
dc.title.none.fl_str_mv	Differentiable Measures for Speech Spectral Modeling
title	Differentiable Measures for Speech Spectral Modeling
spellingShingle	Differentiable Measures for Speech Spectral Modeling Arjona Ramírez, Miguel Autoregressive processes Machine learning algorithms Prediction methods Selfsupervised learning Speech analysis Spectral analysis Processos autorregressivos Algoritmos de aprendizagem de máquinas Métodos de previsão Aprendizado autossupervisionado Análise de discurso Análise espectral
title_short	Differentiable Measures for Speech Spectral Modeling
title_full	Differentiable Measures for Speech Spectral Modeling
title_fullStr	Differentiable Measures for Speech Spectral Modeling
title_full_unstemmed	Differentiable Measures for Speech Spectral Modeling
title_sort	Differentiable Measures for Speech Spectral Modeling
author	Arjona Ramírez, Miguel
author_facet	Arjona Ramírez, Miguel Beccaro, Wesley Rodríguez, Demóstenes Zegarra Rosa, Renata Lopes
author_role	author
author2	Beccaro, Wesley Rodríguez, Demóstenes Zegarra Rosa, Renata Lopes
author2_role	author author author
dc.contributor.author.fl_str_mv	Arjona Ramírez, Miguel Beccaro, Wesley Rodríguez, Demóstenes Zegarra Rosa, Renata Lopes
dc.subject.por.fl_str_mv	Autoregressive processes Machine learning algorithms Prediction methods Selfsupervised learning Speech analysis Spectral analysis Processos autorregressivos Algoritmos de aprendizagem de máquinas Métodos de previsão Aprendizado autossupervisionado Análise de discurso Análise espectral
topic	Autoregressive processes Machine learning algorithms Prediction methods Selfsupervised learning Speech analysis Spectral analysis Processos autorregressivos Algoritmos de aprendizagem de máquinas Métodos de previsão Aprendizado autossupervisionado Análise de discurso Análise espectral
description	Autoregressive models for the envelope of speech power spectral densities (PSDs) are refined by the self-supervised spectral learning machine (S3LM) provided with differentiable spectral objective functions, including the Itakura-Saito divergence (ISD), the Kullback-Leibler divergence (KLD), the reverse KLD (RKLD) and the log spectral distortion (LSD), which display more significant results. However, in order to assess the models more perceptually, a method is proposed based upon perturbations around perfect reconstruction analysis-synthesis configurations. In the cross-excitation analysis-synthesis assessment (CEASA) method, the residual signals generated by analysis filters of the spectral models are injected as excitation into the synthesis filters derived from the same and other models in order to be evaluated by the perceptual evaluation of speech quality (PESQ) and Itakura divergence (ID), which are averaged over a set of models obtained using the objective functions mentioned above. The results lead to a superior performance when the RKLD is used as the loss function for the estimation of the spectral models with the ISD ranking close behind. The focus of these divergences on the spectral peaks is argued and pointed as the most important factor for this behavior. Specifically, using the PESQ scores obtained with CEASA, the RKLD loss is found to improve the performance by 1.0%, 4.0% and 19.3% with respect to the open-loop analysis, the KLD and the LSD models, respectively, while the corresponding improvements for the ISD loss are 0.1%, 3.0% and 18.2%, and the RKLD models excel the ISD models by 1.0% on average. Even though the spectral measures alone are not able to unequivocally distinguish the better of the two, CEASA is shown to have enough sensitivity to distinguish their performances. In summary, the learning machine S3LM fits models for the short-term spectral envelope of speech and, for the evaluation of its performance under several differentiable loss...
publishDate	2022
dc.date.none.fl_str_mv	2022-07-18T20:53:34Z 2022-07-18T20:53:34Z 2022-02
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	ARJONA RAMÍREZ, M. et al. Differentiable Measures for Speech Spectral Modeling. IEEE Access, [S.I.], v. 10, p. 17609-17618, 2022. DOI: 10.1109/ACCESS.2022.3150728. http://repositorio.ufla.br/jspui/handle/1/50638
identifier_str_mv	ARJONA RAMÍREZ, M. et al. Differentiable Measures for Speech Spectral Modeling. IEEE Access, [S.I.], v. 10, p. 17609-17618, 2022. DOI: 10.1109/ACCESS.2022.3150728.
url	http://repositorio.ufla.br/jspui/handle/1/50638
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	http://creativecommons.org/licenses/by/4.0/ info:eu-repo/semantics/openAccess
rights_invalid_str_mv	http://creativecommons.org/licenses/by/4.0/
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Institute of Electrical and Electronics Engineers (IEEE)
publisher.none.fl_str_mv	Institute of Electrical and Electronics Engineers (IEEE)
dc.source.none.fl_str_mv	IEEE Access reponame:Repositório Institucional da UFLA instname:Universidade Federal de Lavras (UFLA) instacron:UFLA
instname_str	Universidade Federal de Lavras (UFLA)
instacron_str	UFLA
institution	UFLA
reponame_str	Repositório Institucional da UFLA
collection	Repositório Institucional da UFLA
repository.name.fl_str_mv	Repositório Institucional da UFLA - Universidade Federal de Lavras (UFLA)
repository.mail.fl_str_mv	nivaldo@ufla.br \|\| repositorio.biblioteca@ufla.br
_version_	1807835194368983040

Differentiable Measures for Speech Spectral Modeling

Registros relacionados