Improving the role of unvoiced speech segments by spectral normalisation in robust speech recognition
Autor(a) principal: | |
---|---|
Data de Publicação: | 2002 |
Outros Autores: | , |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/1822/2147 |
Resumo: | This paper presents a spectral normalisation based method for extraction of speech robust features in additive noise. The method has two main goals: 1) The “peaked” spectral zones, where the most speech energy is concentrated must be preserved (from clean to noisy speech features) as much as possible by the feature extraction process. Usually, these spectral regions are the most reliable due to the higher speech energy, and the frequently assumption of independence between speech and noise. 2) The speech regions with less energy need more robustness, since in these regions the noise is more dominant, thus the speech is more corrupted. Usually these speech regions correspond to unvoiced speech where are included nearly half of the consonants. The proposed normalisation will be optimal if the corrupted and the noise process have both white noise characteristics. Optimal normalisation means that the corrupting noise does not change at all the means of the observed vectors of the corrupted process. For Signal to Noise Ratio greater than 5 dB the results show that for stationary white noise, the proposed normalisation process where the noise characteristics are ignored, outperforms the conventional Markov models composition where the noise must be known. Additionally, if the noise is known, a reasonable approximation of the inverted system can easily be obtained by performing noise compensation and still increasing the recogniser performance. |
id |
RCAP_54c90d57254b323bbdc1624fd59ba96e |
---|---|
oai_identifier_str |
oai:repositorium.sdum.uminho.pt:1822/2147 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Improving the role of unvoiced speech segments by spectral normalisation in robust speech recognitionFeature robustnessRobust speech recognitionThis paper presents a spectral normalisation based method for extraction of speech robust features in additive noise. The method has two main goals: 1) The “peaked” spectral zones, where the most speech energy is concentrated must be preserved (from clean to noisy speech features) as much as possible by the feature extraction process. Usually, these spectral regions are the most reliable due to the higher speech energy, and the frequently assumption of independence between speech and noise. 2) The speech regions with less energy need more robustness, since in these regions the noise is more dominant, thus the speech is more corrupted. Usually these speech regions correspond to unvoiced speech where are included nearly half of the consonants. The proposed normalisation will be optimal if the corrupted and the noise process have both white noise characteristics. Optimal normalisation means that the corrupting noise does not change at all the means of the observed vectors of the corrupted process. For Signal to Noise Ratio greater than 5 dB the results show that for stationary white noise, the proposed normalisation process where the noise characteristics are ignored, outperforms the conventional Markov models composition where the noise must be known. Additionally, if the noise is known, a reasonable approximation of the inverted system can easily be obtained by performing noise compensation and still increasing the recogniser performance.(undefined)International Speech Communication AssociationUniversidade do MinhoLima, C. S.Almeida, Luís B.Monteiro, João L.2002-092002-09-01T00:00:00Zconference paperinfo:eu-repo/semantics/publishedVersionapplication/pdfhttp://hdl.handle.net/1822/2147engINTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING (ICSLP), 7, Denver, 2002.info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-05-11T06:00:10Zoai:repositorium.sdum.uminho.pt:1822/2147Portal AgregadorONGhttps://www.rcaap.pt/oai/openairemluisa.alvim@gmail.comopendoar:71602024-05-11T06:00:10Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Improving the role of unvoiced speech segments by spectral normalisation in robust speech recognition |
title |
Improving the role of unvoiced speech segments by spectral normalisation in robust speech recognition |
spellingShingle |
Improving the role of unvoiced speech segments by spectral normalisation in robust speech recognition Lima, C. S. Feature robustness Robust speech recognition |
title_short |
Improving the role of unvoiced speech segments by spectral normalisation in robust speech recognition |
title_full |
Improving the role of unvoiced speech segments by spectral normalisation in robust speech recognition |
title_fullStr |
Improving the role of unvoiced speech segments by spectral normalisation in robust speech recognition |
title_full_unstemmed |
Improving the role of unvoiced speech segments by spectral normalisation in robust speech recognition |
title_sort |
Improving the role of unvoiced speech segments by spectral normalisation in robust speech recognition |
author |
Lima, C. S. |
author_facet |
Lima, C. S. Almeida, Luís B. Monteiro, João L. |
author_role |
author |
author2 |
Almeida, Luís B. Monteiro, João L. |
author2_role |
author author |
dc.contributor.none.fl_str_mv |
Universidade do Minho |
dc.contributor.author.fl_str_mv |
Lima, C. S. Almeida, Luís B. Monteiro, João L. |
dc.subject.por.fl_str_mv |
Feature robustness Robust speech recognition |
topic |
Feature robustness Robust speech recognition |
description |
This paper presents a spectral normalisation based method for extraction of speech robust features in additive noise. The method has two main goals: 1) The “peaked” spectral zones, where the most speech energy is concentrated must be preserved (from clean to noisy speech features) as much as possible by the feature extraction process. Usually, these spectral regions are the most reliable due to the higher speech energy, and the frequently assumption of independence between speech and noise. 2) The speech regions with less energy need more robustness, since in these regions the noise is more dominant, thus the speech is more corrupted. Usually these speech regions correspond to unvoiced speech where are included nearly half of the consonants. The proposed normalisation will be optimal if the corrupted and the noise process have both white noise characteristics. Optimal normalisation means that the corrupting noise does not change at all the means of the observed vectors of the corrupted process. For Signal to Noise Ratio greater than 5 dB the results show that for stationary white noise, the proposed normalisation process where the noise characteristics are ignored, outperforms the conventional Markov models composition where the noise must be known. Additionally, if the noise is known, a reasonable approximation of the inverted system can easily be obtained by performing noise compensation and still increasing the recogniser performance. |
publishDate |
2002 |
dc.date.none.fl_str_mv |
2002-09 2002-09-01T00:00:00Z |
dc.type.driver.fl_str_mv |
conference paper |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/1822/2147 |
url |
http://hdl.handle.net/1822/2147 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING (ICSLP), 7, Denver, 2002. |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
International Speech Communication Association |
publisher.none.fl_str_mv |
International Speech Communication Association |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
mluisa.alvim@gmail.com |
_version_ |
1817544816177709056 |