Differentiable Measures for Speech Spectral Modeling
Autor(a) principal: | |
---|---|
Data de Publicação: | 2022 |
Outros Autores: | , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Institucional da UFLA |
Texto Completo: | http://repositorio.ufla.br/jspui/handle/1/50638 |
Resumo: | Autoregressive models for the envelope of speech power spectral densities (PSDs) are refined by the self-supervised spectral learning machine (S3LM) provided with differentiable spectral objective functions, including the Itakura-Saito divergence (ISD), the Kullback-Leibler divergence (KLD), the reverse KLD (RKLD) and the log spectral distortion (LSD), which display more significant results. However, in order to assess the models more perceptually, a method is proposed based upon perturbations around perfect reconstruction analysis-synthesis configurations. In the cross-excitation analysis-synthesis assessment (CEASA) method, the residual signals generated by analysis filters of the spectral models are injected as excitation into the synthesis filters derived from the same and other models in order to be evaluated by the perceptual evaluation of speech quality (PESQ) and Itakura divergence (ID), which are averaged over a set of models obtained using the objective functions mentioned above. The results lead to a superior performance when the RKLD is used as the loss function for the estimation of the spectral models with the ISD ranking close behind. The focus of these divergences on the spectral peaks is argued and pointed as the most important factor for this behavior. Specifically, using the PESQ scores obtained with CEASA, the RKLD loss is found to improve the performance by 1.0%, 4.0% and 19.3% with respect to the open-loop analysis, the KLD and the LSD models, respectively, while the corresponding improvements for the ISD loss are 0.1%, 3.0% and 18.2%, and the RKLD models excel the ISD models by 1.0% on average. Even though the spectral measures alone are not able to unequivocally distinguish the better of the two, CEASA is shown to have enough sensitivity to distinguish their performances. In summary, the learning machine S3LM fits models for the short-term spectral envelope of speech and, for the evaluation of its performance under several differentiable loss... |
id |
UFLA_60d6c80f6569284d3f17d2b41882207e |
---|---|
oai_identifier_str |
oai:localhost:1/50638 |
network_acronym_str |
UFLA |
network_name_str |
Repositório Institucional da UFLA |
repository_id_str |
|
spelling |
Differentiable Measures for Speech Spectral ModelingAutoregressive processesMachine learning algorithmsPrediction methodsSelfsupervised learningSpeech analysisSpectral analysisProcessos autorregressivosAlgoritmos de aprendizagem de máquinasMétodos de previsãoAprendizado autossupervisionadoAnálise de discursoAnálise espectralAutoregressive models for the envelope of speech power spectral densities (PSDs) are refined by the self-supervised spectral learning machine (S3LM) provided with differentiable spectral objective functions, including the Itakura-Saito divergence (ISD), the Kullback-Leibler divergence (KLD), the reverse KLD (RKLD) and the log spectral distortion (LSD), which display more significant results. However, in order to assess the models more perceptually, a method is proposed based upon perturbations around perfect reconstruction analysis-synthesis configurations. In the cross-excitation analysis-synthesis assessment (CEASA) method, the residual signals generated by analysis filters of the spectral models are injected as excitation into the synthesis filters derived from the same and other models in order to be evaluated by the perceptual evaluation of speech quality (PESQ) and Itakura divergence (ID), which are averaged over a set of models obtained using the objective functions mentioned above. The results lead to a superior performance when the RKLD is used as the loss function for the estimation of the spectral models with the ISD ranking close behind. The focus of these divergences on the spectral peaks is argued and pointed as the most important factor for this behavior. Specifically, using the PESQ scores obtained with CEASA, the RKLD loss is found to improve the performance by 1.0%, 4.0% and 19.3% with respect to the open-loop analysis, the KLD and the LSD models, respectively, while the corresponding improvements for the ISD loss are 0.1%, 3.0% and 18.2%, and the RKLD models excel the ISD models by 1.0% on average. Even though the spectral measures alone are not able to unequivocally distinguish the better of the two, CEASA is shown to have enough sensitivity to distinguish their performances. In summary, the learning machine S3LM fits models for the short-term spectral envelope of speech and, for the evaluation of its performance under several differentiable loss...Institute of Electrical and Electronics Engineers (IEEE)2022-07-18T20:53:34Z2022-07-18T20:53:34Z2022-02info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfARJONA RAMÍREZ, M. et al. Differentiable Measures for Speech Spectral Modeling. IEEE Access, [S.I.], v. 10, p. 17609-17618, 2022. DOI: 10.1109/ACCESS.2022.3150728.http://repositorio.ufla.br/jspui/handle/1/50638IEEE Accessreponame:Repositório Institucional da UFLAinstname:Universidade Federal de Lavras (UFLA)instacron:UFLAhttp://creativecommons.org/licenses/by/4.0/info:eu-repo/semantics/openAccessArjona Ramírez, MiguelBeccaro, WesleyRodríguez, Demóstenes ZegarraRosa, Renata Lopeseng2023-05-03T13:09:07Zoai:localhost:1/50638Repositório InstitucionalPUBhttp://repositorio.ufla.br/oai/requestnivaldo@ufla.br || repositorio.biblioteca@ufla.bropendoar:2023-05-03T13:09:07Repositório Institucional da UFLA - Universidade Federal de Lavras (UFLA)false |
dc.title.none.fl_str_mv |
Differentiable Measures for Speech Spectral Modeling |
title |
Differentiable Measures for Speech Spectral Modeling |
spellingShingle |
Differentiable Measures for Speech Spectral Modeling Arjona Ramírez, Miguel Autoregressive processes Machine learning algorithms Prediction methods Selfsupervised learning Speech analysis Spectral analysis Processos autorregressivos Algoritmos de aprendizagem de máquinas Métodos de previsão Aprendizado autossupervisionado Análise de discurso Análise espectral |
title_short |
Differentiable Measures for Speech Spectral Modeling |
title_full |
Differentiable Measures for Speech Spectral Modeling |
title_fullStr |
Differentiable Measures for Speech Spectral Modeling |
title_full_unstemmed |
Differentiable Measures for Speech Spectral Modeling |
title_sort |
Differentiable Measures for Speech Spectral Modeling |
author |
Arjona Ramírez, Miguel |
author_facet |
Arjona Ramírez, Miguel Beccaro, Wesley Rodríguez, Demóstenes Zegarra Rosa, Renata Lopes |
author_role |
author |
author2 |
Beccaro, Wesley Rodríguez, Demóstenes Zegarra Rosa, Renata Lopes |
author2_role |
author author author |
dc.contributor.author.fl_str_mv |
Arjona Ramírez, Miguel Beccaro, Wesley Rodríguez, Demóstenes Zegarra Rosa, Renata Lopes |
dc.subject.por.fl_str_mv |
Autoregressive processes Machine learning algorithms Prediction methods Selfsupervised learning Speech analysis Spectral analysis Processos autorregressivos Algoritmos de aprendizagem de máquinas Métodos de previsão Aprendizado autossupervisionado Análise de discurso Análise espectral |
topic |
Autoregressive processes Machine learning algorithms Prediction methods Selfsupervised learning Speech analysis Spectral analysis Processos autorregressivos Algoritmos de aprendizagem de máquinas Métodos de previsão Aprendizado autossupervisionado Análise de discurso Análise espectral |
description |
Autoregressive models for the envelope of speech power spectral densities (PSDs) are refined by the self-supervised spectral learning machine (S3LM) provided with differentiable spectral objective functions, including the Itakura-Saito divergence (ISD), the Kullback-Leibler divergence (KLD), the reverse KLD (RKLD) and the log spectral distortion (LSD), which display more significant results. However, in order to assess the models more perceptually, a method is proposed based upon perturbations around perfect reconstruction analysis-synthesis configurations. In the cross-excitation analysis-synthesis assessment (CEASA) method, the residual signals generated by analysis filters of the spectral models are injected as excitation into the synthesis filters derived from the same and other models in order to be evaluated by the perceptual evaluation of speech quality (PESQ) and Itakura divergence (ID), which are averaged over a set of models obtained using the objective functions mentioned above. The results lead to a superior performance when the RKLD is used as the loss function for the estimation of the spectral models with the ISD ranking close behind. The focus of these divergences on the spectral peaks is argued and pointed as the most important factor for this behavior. Specifically, using the PESQ scores obtained with CEASA, the RKLD loss is found to improve the performance by 1.0%, 4.0% and 19.3% with respect to the open-loop analysis, the KLD and the LSD models, respectively, while the corresponding improvements for the ISD loss are 0.1%, 3.0% and 18.2%, and the RKLD models excel the ISD models by 1.0% on average. Even though the spectral measures alone are not able to unequivocally distinguish the better of the two, CEASA is shown to have enough sensitivity to distinguish their performances. In summary, the learning machine S3LM fits models for the short-term spectral envelope of speech and, for the evaluation of its performance under several differentiable loss... |
publishDate |
2022 |
dc.date.none.fl_str_mv |
2022-07-18T20:53:34Z 2022-07-18T20:53:34Z 2022-02 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
ARJONA RAMÍREZ, M. et al. Differentiable Measures for Speech Spectral Modeling. IEEE Access, [S.I.], v. 10, p. 17609-17618, 2022. DOI: 10.1109/ACCESS.2022.3150728. http://repositorio.ufla.br/jspui/handle/1/50638 |
identifier_str_mv |
ARJONA RAMÍREZ, M. et al. Differentiable Measures for Speech Spectral Modeling. IEEE Access, [S.I.], v. 10, p. 17609-17618, 2022. DOI: 10.1109/ACCESS.2022.3150728. |
url |
http://repositorio.ufla.br/jspui/handle/1/50638 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
http://creativecommons.org/licenses/by/4.0/ info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
http://creativecommons.org/licenses/by/4.0/ |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Institute of Electrical and Electronics Engineers (IEEE) |
publisher.none.fl_str_mv |
Institute of Electrical and Electronics Engineers (IEEE) |
dc.source.none.fl_str_mv |
IEEE Access reponame:Repositório Institucional da UFLA instname:Universidade Federal de Lavras (UFLA) instacron:UFLA |
instname_str |
Universidade Federal de Lavras (UFLA) |
instacron_str |
UFLA |
institution |
UFLA |
reponame_str |
Repositório Institucional da UFLA |
collection |
Repositório Institucional da UFLA |
repository.name.fl_str_mv |
Repositório Institucional da UFLA - Universidade Federal de Lavras (UFLA) |
repository.mail.fl_str_mv |
nivaldo@ufla.br || repositorio.biblioteca@ufla.br |
_version_ |
1815439289593364480 |