Differentiable Measures for Speech Spectral Modeling

Detalhes bibliográficos
Autor(a) principal: Arjona Ramírez, Miguel
Data de Publicação: 2022
Outros Autores: Beccaro, Wesley, Rodríguez, Demóstenes Zegarra, Rosa, Renata Lopes
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Institucional da UFLA
Texto Completo: http://repositorio.ufla.br/jspui/handle/1/50638
Resumo: Autoregressive models for the envelope of speech power spectral densities (PSDs) are refined by the self-supervised spectral learning machine (S3LM) provided with differentiable spectral objective functions, including the Itakura-Saito divergence (ISD), the Kullback-Leibler divergence (KLD), the reverse KLD (RKLD) and the log spectral distortion (LSD), which display more significant results. However, in order to assess the models more perceptually, a method is proposed based upon perturbations around perfect reconstruction analysis-synthesis configurations. In the cross-excitation analysis-synthesis assessment (CEASA) method, the residual signals generated by analysis filters of the spectral models are injected as excitation into the synthesis filters derived from the same and other models in order to be evaluated by the perceptual evaluation of speech quality (PESQ) and Itakura divergence (ID), which are averaged over a set of models obtained using the objective functions mentioned above. The results lead to a superior performance when the RKLD is used as the loss function for the estimation of the spectral models with the ISD ranking close behind. The focus of these divergences on the spectral peaks is argued and pointed as the most important factor for this behavior. Specifically, using the PESQ scores obtained with CEASA, the RKLD loss is found to improve the performance by 1.0%, 4.0% and 19.3% with respect to the open-loop analysis, the KLD and the LSD models, respectively, while the corresponding improvements for the ISD loss are 0.1%, 3.0% and 18.2%, and the RKLD models excel the ISD models by 1.0% on average. Even though the spectral measures alone are not able to unequivocally distinguish the better of the two, CEASA is shown to have enough sensitivity to distinguish their performances. In summary, the learning machine S3LM fits models for the short-term spectral envelope of speech and, for the evaluation of its performance under several differentiable loss...
id UFLA_60d6c80f6569284d3f17d2b41882207e
oai_identifier_str oai:localhost:1/50638
network_acronym_str UFLA
network_name_str Repositório Institucional da UFLA
repository_id_str
spelling Differentiable Measures for Speech Spectral ModelingAutoregressive processesMachine learning algorithmsPrediction methodsSelfsupervised learningSpeech analysisSpectral analysisProcessos autorregressivosAlgoritmos de aprendizagem de máquinasMétodos de previsãoAprendizado autossupervisionadoAnálise de discursoAnálise espectralAutoregressive models for the envelope of speech power spectral densities (PSDs) are refined by the self-supervised spectral learning machine (S3LM) provided with differentiable spectral objective functions, including the Itakura-Saito divergence (ISD), the Kullback-Leibler divergence (KLD), the reverse KLD (RKLD) and the log spectral distortion (LSD), which display more significant results. However, in order to assess the models more perceptually, a method is proposed based upon perturbations around perfect reconstruction analysis-synthesis configurations. In the cross-excitation analysis-synthesis assessment (CEASA) method, the residual signals generated by analysis filters of the spectral models are injected as excitation into the synthesis filters derived from the same and other models in order to be evaluated by the perceptual evaluation of speech quality (PESQ) and Itakura divergence (ID), which are averaged over a set of models obtained using the objective functions mentioned above. The results lead to a superior performance when the RKLD is used as the loss function for the estimation of the spectral models with the ISD ranking close behind. The focus of these divergences on the spectral peaks is argued and pointed as the most important factor for this behavior. Specifically, using the PESQ scores obtained with CEASA, the RKLD loss is found to improve the performance by 1.0%, 4.0% and 19.3% with respect to the open-loop analysis, the KLD and the LSD models, respectively, while the corresponding improvements for the ISD loss are 0.1%, 3.0% and 18.2%, and the RKLD models excel the ISD models by 1.0% on average. Even though the spectral measures alone are not able to unequivocally distinguish the better of the two, CEASA is shown to have enough sensitivity to distinguish their performances. In summary, the learning machine S3LM fits models for the short-term spectral envelope of speech and, for the evaluation of its performance under several differentiable loss...Institute of Electrical and Electronics Engineers (IEEE)2022-07-18T20:53:34Z2022-07-18T20:53:34Z2022-02info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfARJONA RAMÍREZ, M. et al. Differentiable Measures for Speech Spectral Modeling. IEEE Access, [S.I.], v. 10, p. 17609-17618, 2022. DOI: 10.1109/ACCESS.2022.3150728.http://repositorio.ufla.br/jspui/handle/1/50638IEEE Accessreponame:Repositório Institucional da UFLAinstname:Universidade Federal de Lavras (UFLA)instacron:UFLAhttp://creativecommons.org/licenses/by/4.0/info:eu-repo/semantics/openAccessArjona Ramírez, MiguelBeccaro, WesleyRodríguez, Demóstenes ZegarraRosa, Renata Lopeseng2023-05-03T13:09:07Zoai:localhost:1/50638Repositório InstitucionalPUBhttp://repositorio.ufla.br/oai/requestnivaldo@ufla.br || repositorio.biblioteca@ufla.bropendoar:2023-05-03T13:09:07Repositório Institucional da UFLA - Universidade Federal de Lavras (UFLA)false
dc.title.none.fl_str_mv Differentiable Measures for Speech Spectral Modeling
title Differentiable Measures for Speech Spectral Modeling
spellingShingle Differentiable Measures for Speech Spectral Modeling
Arjona Ramírez, Miguel
Autoregressive processes
Machine learning algorithms
Prediction methods
Selfsupervised learning
Speech analysis
Spectral analysis
Processos autorregressivos
Algoritmos de aprendizagem de máquinas
Métodos de previsão
Aprendizado autossupervisionado
Análise de discurso
Análise espectral
title_short Differentiable Measures for Speech Spectral Modeling
title_full Differentiable Measures for Speech Spectral Modeling
title_fullStr Differentiable Measures for Speech Spectral Modeling
title_full_unstemmed Differentiable Measures for Speech Spectral Modeling
title_sort Differentiable Measures for Speech Spectral Modeling
author Arjona Ramírez, Miguel
author_facet Arjona Ramírez, Miguel
Beccaro, Wesley
Rodríguez, Demóstenes Zegarra
Rosa, Renata Lopes
author_role author
author2 Beccaro, Wesley
Rodríguez, Demóstenes Zegarra
Rosa, Renata Lopes
author2_role author
author
author
dc.contributor.author.fl_str_mv Arjona Ramírez, Miguel
Beccaro, Wesley
Rodríguez, Demóstenes Zegarra
Rosa, Renata Lopes
dc.subject.por.fl_str_mv Autoregressive processes
Machine learning algorithms
Prediction methods
Selfsupervised learning
Speech analysis
Spectral analysis
Processos autorregressivos
Algoritmos de aprendizagem de máquinas
Métodos de previsão
Aprendizado autossupervisionado
Análise de discurso
Análise espectral
topic Autoregressive processes
Machine learning algorithms
Prediction methods
Selfsupervised learning
Speech analysis
Spectral analysis
Processos autorregressivos
Algoritmos de aprendizagem de máquinas
Métodos de previsão
Aprendizado autossupervisionado
Análise de discurso
Análise espectral
description Autoregressive models for the envelope of speech power spectral densities (PSDs) are refined by the self-supervised spectral learning machine (S3LM) provided with differentiable spectral objective functions, including the Itakura-Saito divergence (ISD), the Kullback-Leibler divergence (KLD), the reverse KLD (RKLD) and the log spectral distortion (LSD), which display more significant results. However, in order to assess the models more perceptually, a method is proposed based upon perturbations around perfect reconstruction analysis-synthesis configurations. In the cross-excitation analysis-synthesis assessment (CEASA) method, the residual signals generated by analysis filters of the spectral models are injected as excitation into the synthesis filters derived from the same and other models in order to be evaluated by the perceptual evaluation of speech quality (PESQ) and Itakura divergence (ID), which are averaged over a set of models obtained using the objective functions mentioned above. The results lead to a superior performance when the RKLD is used as the loss function for the estimation of the spectral models with the ISD ranking close behind. The focus of these divergences on the spectral peaks is argued and pointed as the most important factor for this behavior. Specifically, using the PESQ scores obtained with CEASA, the RKLD loss is found to improve the performance by 1.0%, 4.0% and 19.3% with respect to the open-loop analysis, the KLD and the LSD models, respectively, while the corresponding improvements for the ISD loss are 0.1%, 3.0% and 18.2%, and the RKLD models excel the ISD models by 1.0% on average. Even though the spectral measures alone are not able to unequivocally distinguish the better of the two, CEASA is shown to have enough sensitivity to distinguish their performances. In summary, the learning machine S3LM fits models for the short-term spectral envelope of speech and, for the evaluation of its performance under several differentiable loss...
publishDate 2022
dc.date.none.fl_str_mv 2022-07-18T20:53:34Z
2022-07-18T20:53:34Z
2022-02
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv ARJONA RAMÍREZ, M. et al. Differentiable Measures for Speech Spectral Modeling. IEEE Access, [S.I.], v. 10, p. 17609-17618, 2022. DOI: 10.1109/ACCESS.2022.3150728.
http://repositorio.ufla.br/jspui/handle/1/50638
identifier_str_mv ARJONA RAMÍREZ, M. et al. Differentiable Measures for Speech Spectral Modeling. IEEE Access, [S.I.], v. 10, p. 17609-17618, 2022. DOI: 10.1109/ACCESS.2022.3150728.
url http://repositorio.ufla.br/jspui/handle/1/50638
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv http://creativecommons.org/licenses/by/4.0/
info:eu-repo/semantics/openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by/4.0/
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Institute of Electrical and Electronics Engineers (IEEE)
publisher.none.fl_str_mv Institute of Electrical and Electronics Engineers (IEEE)
dc.source.none.fl_str_mv IEEE Access
reponame:Repositório Institucional da UFLA
instname:Universidade Federal de Lavras (UFLA)
instacron:UFLA
instname_str Universidade Federal de Lavras (UFLA)
instacron_str UFLA
institution UFLA
reponame_str Repositório Institucional da UFLA
collection Repositório Institucional da UFLA
repository.name.fl_str_mv Repositório Institucional da UFLA - Universidade Federal de Lavras (UFLA)
repository.mail.fl_str_mv nivaldo@ufla.br || repositorio.biblioteca@ufla.br
_version_ 1807835194368983040