Preprocessing models for speech technologies: The impact of the Normalizer and the Grapheme-to-Phone on hybrid systems

Carriço, Bruna; Shulby, Christopher; Moniz, Helena

Preprocessing models for speech technologies: The impact of the Normalizer and the Grapheme-to-Phone on hybrid systems

Detalhes bibliográficos
Autor(a) principal:	Carriço, Bruna
Data de Publicação:	2023
Outros Autores:	Shulby, Christopher, Moniz, Helena
Tipo de documento:	Artigo
Idioma:	por
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	https://doi.org/10.26334/2183-9077/rapln10ano2023a6
Resumo:	This paper describes the linguistic preprocessing methods on hybrid systems provided by an Artificial Intelligence (AI) international company, Defined.ai. The startup focuses on providing high-quality data, models, and AI tools. The main goal of this work is to enhance and advance the quality of preprocessing models by applying linguistic knowledge. Thus, we focus on two introductory linguistic models in a speech pipeline: Normalizer and Grapheme-to-Phone (G2P). To do so, two initiatives were conducted in collaboration with the Defined.ai Machine Learning team. The first project focuses on expanding and improving a European Portuguese Normalizer model. The second project covers creating G2P models for two different languages – Swedish and Russian. Results show that having a rule-based approach to the Normalizer and G2P increases its accuracy and performance, representing a significant advantage in improving Defined.ai tools and speech pipelines. Also, with the results obtained on the first project, we improved the normalizer in ease of use by increasing each rule with linguistic knowledge. Accordingly, our research demonstrates the added value of linguistic knowledge in preprocessing models.

Metadados do item

id	RCAP_f8bcc15a53eba9338ab47824ef507077
oai_identifier_str	oai:ojs3.ojs.apl.pt:article/167
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	Preprocessing models for speech technologies: The impact of the Normalizer and the Grapheme-to-Phone on hybrid systemsModelos de pré-processamento para tecnologias da fala : O impacto do normalizador e do grafema-fone nos sistemas híbridostecnologias de falanormalizadorgrafema-foneconhecimento linguísticomodelosspeech technologiesnormalizergrapheme-to-phonelinguistic knowledgemodelsThis paper describes the linguistic preprocessing methods on hybrid systems provided by an Artificial Intelligence (AI) international company, Defined.ai. The startup focuses on providing high-quality data, models, and AI tools. The main goal of this work is to enhance and advance the quality of preprocessing models by applying linguistic knowledge. Thus, we focus on two introductory linguistic models in a speech pipeline: Normalizer and Grapheme-to-Phone (G2P). To do so, two initiatives were conducted in collaboration with the Defined.ai Machine Learning team. The first project focuses on expanding and improving a European Portuguese Normalizer model. The second project covers creating G2P models for two different languages – Swedish and Russian. Results show that having a rule-based approach to the Normalizer and G2P increases its accuracy and performance, representing a significant advantage in improving Defined.ai tools and speech pipelines. Also, with the results obtained on the first project, we improved the normalizer in ease of use by increasing each rule with linguistic knowledge. Accordingly, our research demonstrates the added value of linguistic knowledge in preprocessing models.Este artigo descreve os métodos de pré-processamento linguístico em sistemas híbridos fornecidos por uma empresa internacional de Inteligência Artificial (IA), a Defined.ai. A startup concentra-se em fornecer dados, modelos e ferramentas de IA de alta qualidade. O objetivo principal deste trabalho é aprimorar e avançar a qualidade dos modelos de pré-processamento aplicando conhecimento linguístico. Assim, focamo-nos em dois modelos linguísticos introdutórios numa arquitetura de fala: o Normalizador e o Grafema-para-fone (G2P). Para isso, foram realizadas duas iniciativas em colaboração com a equipa de Machine Learning da Defined.ai. O primeiro projeto concentra-se em expandir e melhorar um modelo de Normalizador para o Português Europeu. O segundo projeto cobre a criação de modelos G2P para duas línguas – Sueco e Russo. Os resultados mostram que ter uma abordagem baseada em regras para o Normalizador e G2P aumenta a sua precisão e o seu desempenho, representando uma vantagem significativa na melhoria das ferramentas e das arquiteturas de fala da empresa. Além disso, com os resultados obtidos no primeiro projeto, melhoramos o normalizador em termos de facilidade de uso, aumentado cada regra com conhecimento linguístico. Dessa forma, a nossa pesquisa demonstra o valor do conhecimento linguístico em modelos de pré-processamento.Associação Portuguesa de Linguística2023-10-22info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://doi.org/10.26334/2183-9077/rapln10ano2023a6https://doi.org/10.26334/2183-9077/rapln10ano2023a6Revista da Associação Portuguesa de Linguística; No. 10 (2023): Journal of the Portuguese Linguistics Association; 92–114Revista da Associação Portuguesa de Linguística; N.º 10 (2023): Revista da Associação Portuguesa de Linguística; 92–1142183-907710.26334/2183-9077/rapln10ano2023tdreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAPporhttps://ojs.apl.pt/index.php/rapl/article/view/167https://ojs.apl.pt/index.php/rapl/article/view/167/221Direitos de Autor (c) 2023 Bruna Carriço, Christopher Shulby, Helena Monizinfo:eu-repo/semantics/openAccessCarriço, BrunaShulby, ChristopherMoniz, Helena2023-12-09T10:16:16Zoai:ojs3.ojs.apl.pt:article/167Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T21:26:03.738671Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	Preprocessing models for speech technologies: The impact of the Normalizer and the Grapheme-to-Phone on hybrid systems Modelos de pré-processamento para tecnologias da fala : O impacto do normalizador e do grafema-fone nos sistemas híbridos
title	Preprocessing models for speech technologies: The impact of the Normalizer and the Grapheme-to-Phone on hybrid systems
spellingShingle	Preprocessing models for speech technologies: The impact of the Normalizer and the Grapheme-to-Phone on hybrid systems Carriço, Bruna tecnologias de fala normalizador grafema-fone conhecimento linguístico modelos speech technologies normalizer grapheme-to-phone linguistic knowledge models
title_short	Preprocessing models for speech technologies: The impact of the Normalizer and the Grapheme-to-Phone on hybrid systems
title_full	Preprocessing models for speech technologies: The impact of the Normalizer and the Grapheme-to-Phone on hybrid systems
title_fullStr	Preprocessing models for speech technologies: The impact of the Normalizer and the Grapheme-to-Phone on hybrid systems
title_full_unstemmed	Preprocessing models for speech technologies: The impact of the Normalizer and the Grapheme-to-Phone on hybrid systems
title_sort	Preprocessing models for speech technologies: The impact of the Normalizer and the Grapheme-to-Phone on hybrid systems
author	Carriço, Bruna
author_facet	Carriço, Bruna Shulby, Christopher Moniz, Helena
author_role	author
author2	Shulby, Christopher Moniz, Helena
author2_role	author author
dc.contributor.author.fl_str_mv	Carriço, Bruna Shulby, Christopher Moniz, Helena
dc.subject.por.fl_str_mv	tecnologias de fala normalizador grafema-fone conhecimento linguístico modelos speech technologies normalizer grapheme-to-phone linguistic knowledge models
topic	tecnologias de fala normalizador grafema-fone conhecimento linguístico modelos speech technologies normalizer grapheme-to-phone linguistic knowledge models
description	This paper describes the linguistic preprocessing methods on hybrid systems provided by an Artificial Intelligence (AI) international company, Defined.ai. The startup focuses on providing high-quality data, models, and AI tools. The main goal of this work is to enhance and advance the quality of preprocessing models by applying linguistic knowledge. Thus, we focus on two introductory linguistic models in a speech pipeline: Normalizer and Grapheme-to-Phone (G2P). To do so, two initiatives were conducted in collaboration with the Defined.ai Machine Learning team. The first project focuses on expanding and improving a European Portuguese Normalizer model. The second project covers creating G2P models for two different languages – Swedish and Russian. Results show that having a rule-based approach to the Normalizer and G2P increases its accuracy and performance, representing a significant advantage in improving Defined.ai tools and speech pipelines. Also, with the results obtained on the first project, we improved the normalizer in ease of use by increasing each rule with linguistic knowledge. Accordingly, our research demonstrates the added value of linguistic knowledge in preprocessing models.
publishDate	2023
dc.date.none.fl_str_mv	2023-10-22
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://doi.org/10.26334/2183-9077/rapln10ano2023a6 https://doi.org/10.26334/2183-9077/rapln10ano2023a6
url	https://doi.org/10.26334/2183-9077/rapln10ano2023a6
dc.language.iso.fl_str_mv	por
language	por
dc.relation.none.fl_str_mv	https://ojs.apl.pt/index.php/rapl/article/view/167 https://ojs.apl.pt/index.php/rapl/article/view/167/221
dc.rights.driver.fl_str_mv	Direitos de Autor (c) 2023 Bruna Carriço, Christopher Shulby, Helena Moniz info:eu-repo/semantics/openAccess
rights_invalid_str_mv	Direitos de Autor (c) 2023 Bruna Carriço, Christopher Shulby, Helena Moniz
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Associação Portuguesa de Linguística
publisher.none.fl_str_mv	Associação Portuguesa de Linguística
dc.source.none.fl_str_mv	Revista da Associação Portuguesa de Linguística; No. 10 (2023): Journal of the Portuguese Linguistics Association; 92–114 Revista da Associação Portuguesa de Linguística; N.º 10 (2023): Revista da Associação Portuguesa de Linguística; 92–114 2183-9077 10.26334/2183-9077/rapln10ano2023td reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799134142503321600

Preprocessing models for speech technologies: The impact of the Normalizer and the Grapheme-to-Phone on hybrid systems

Registros relacionados