A customized residual neural network and bi-directional gated recurrent unit-based automatic speech recognition model

Detalhes bibliográficos
Autor(a) principal: Selim Reza
Data de Publicação: 2022
Outros Autores: Marta Campos Ferreira, J.J.M. Machado, João Manuel R. S. Tavares
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: https://hdl.handle.net/10216/145993
Resumo: Speech recognition aims to convert human speech into text and has applications in security, healthcare, commerce, automobiles, and technology, just to name a few. Inserting residual neural networks before recurrent neural network cells improves accuracy and cuts training time by a good margin. Furthermore, layer normalization instead of batch normalization is more effective in model training and performance enhancement. Also, the size of the datasets presents tremendous influences in achieving the best performance. Leveraging these tricks, this article proposes an automatic speech recognition model with a stacked five layers of customized Residual Convolution Neural Network and seven layers of Bi-Directional Gated Recurrent Units, including a logarithmic so f tmax for the model output. Each of them incorporates a learnable per-element affine parameter-based layer normalization technique. The training and testing of the new model were conducted on the LibriSpeech corpus and LJ Speech dataset. The experimental results demonstrate a character error rate (CER) of 4.7 and 3.61% on the two datasets, respectively, with only 33 million parameters without the requirement of any external language model.
id RCAP_c75525e1280371c258e61e641729d9c5
oai_identifier_str oai:repositorio-aberto.up.pt:10216/145993
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling A customized residual neural network and bi-directional gated recurrent unit-based automatic speech recognition modelCiências Tecnológicas, Ciências da engenharia e tecnologiasTechnological sciences, Engineering and technologySpeech recognition aims to convert human speech into text and has applications in security, healthcare, commerce, automobiles, and technology, just to name a few. Inserting residual neural networks before recurrent neural network cells improves accuracy and cuts training time by a good margin. Furthermore, layer normalization instead of batch normalization is more effective in model training and performance enhancement. Also, the size of the datasets presents tremendous influences in achieving the best performance. Leveraging these tricks, this article proposes an automatic speech recognition model with a stacked five layers of customized Residual Convolution Neural Network and seven layers of Bi-Directional Gated Recurrent Units, including a logarithmic so f tmax for the model output. Each of them incorporates a learnable per-element affine parameter-based layer normalization technique. The training and testing of the new model were conducted on the LibriSpeech corpus and LJ Speech dataset. The experimental results demonstrate a character error rate (CER) of 4.7 and 3.61% on the two datasets, respectively, with only 33 million parameters without the requirement of any external language model.2022-042022-04-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfimage/pnghttps://hdl.handle.net/10216/145993eng0957-417410.1016/j.eswa.2022.119293Selim RezaMarta Campos FerreiraJ.J.M. MachadoJoão Manuel R. S. Tavaresinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-29T13:46:47Zoai:repositorio-aberto.up.pt:10216/145993Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T23:47:24.327412Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv A customized residual neural network and bi-directional gated recurrent unit-based automatic speech recognition model
title A customized residual neural network and bi-directional gated recurrent unit-based automatic speech recognition model
spellingShingle A customized residual neural network and bi-directional gated recurrent unit-based automatic speech recognition model
Selim Reza
Ciências Tecnológicas, Ciências da engenharia e tecnologias
Technological sciences, Engineering and technology
title_short A customized residual neural network and bi-directional gated recurrent unit-based automatic speech recognition model
title_full A customized residual neural network and bi-directional gated recurrent unit-based automatic speech recognition model
title_fullStr A customized residual neural network and bi-directional gated recurrent unit-based automatic speech recognition model
title_full_unstemmed A customized residual neural network and bi-directional gated recurrent unit-based automatic speech recognition model
title_sort A customized residual neural network and bi-directional gated recurrent unit-based automatic speech recognition model
author Selim Reza
author_facet Selim Reza
Marta Campos Ferreira
J.J.M. Machado
João Manuel R. S. Tavares
author_role author
author2 Marta Campos Ferreira
J.J.M. Machado
João Manuel R. S. Tavares
author2_role author
author
author
dc.contributor.author.fl_str_mv Selim Reza
Marta Campos Ferreira
J.J.M. Machado
João Manuel R. S. Tavares
dc.subject.por.fl_str_mv Ciências Tecnológicas, Ciências da engenharia e tecnologias
Technological sciences, Engineering and technology
topic Ciências Tecnológicas, Ciências da engenharia e tecnologias
Technological sciences, Engineering and technology
description Speech recognition aims to convert human speech into text and has applications in security, healthcare, commerce, automobiles, and technology, just to name a few. Inserting residual neural networks before recurrent neural network cells improves accuracy and cuts training time by a good margin. Furthermore, layer normalization instead of batch normalization is more effective in model training and performance enhancement. Also, the size of the datasets presents tremendous influences in achieving the best performance. Leveraging these tricks, this article proposes an automatic speech recognition model with a stacked five layers of customized Residual Convolution Neural Network and seven layers of Bi-Directional Gated Recurrent Units, including a logarithmic so f tmax for the model output. Each of them incorporates a learnable per-element affine parameter-based layer normalization technique. The training and testing of the new model were conducted on the LibriSpeech corpus and LJ Speech dataset. The experimental results demonstrate a character error rate (CER) of 4.7 and 3.61% on the two datasets, respectively, with only 33 million parameters without the requirement of any external language model.
publishDate 2022
dc.date.none.fl_str_mv 2022-04
2022-04-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://hdl.handle.net/10216/145993
url https://hdl.handle.net/10216/145993
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 0957-4174
10.1016/j.eswa.2022.119293
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
image/png
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799135791921758208