An?lise comparativa entre os m?todos HMM e GMM-UBM na busca pelo a-?timo dos locutores crian?as utilizando a t?cnica VTLN
Autor(a) principal: | |
---|---|
Data de Publicação: | 2014 |
Tipo de documento: | Dissertação |
Idioma: | por |
Título da fonte: | Biblioteca Digital de Teses e Dissertações da INATEL |
Texto Completo: | http://tede.inatel.br:8080/tede/handle/tede/23 |
Resumo: | The aim of this work is to find means to minimize the high error rate found in speech recognition systems which are trained on adult speakers and tested on children speakers. In this regard, we propose the use of the GMM-UBM method as an alternative to the HMM method to find the optimal warping factor (?-optimal) for children speakers when the speaker normalization technique is used. The adopted normalization technique was VTLN, which normalizes the vocal tract of different children speakers through the use of mel filterbank frequency warping. The assessment of this technique also aimed to find the optimal mixture quantity that improves the system performance. Thus, the error rate in the system trained with adults and tested on children was reduced from 4,95% to 1,88% when VTLN was used with ?-optimals found by HMM and to 1,92% when VTLN was used with ?-optimals found by GMM. It was noticed that the application of VTLN technique using ?-optimals found by GMM-UBM method achieved a similar performance to HMM in the experiments. From the experiments it was observed that choosing GMM-UBM method turns to be more suitable due to its implementation simplicity and to the need of lower computational cost, being thus an alternative to HMM in the use of VTLN in Speech Recognition Systems for children speakers. |
id |
INAT_2656bb147a652cef66a26365aad0e28e |
---|---|
oai_identifier_str |
oai:localhost:tede/23 |
network_acronym_str |
INAT |
network_name_str |
Biblioteca Digital de Teses e Dissertações da INATEL |
repository_id_str |
|
spelling |
Ynoguti, Carlos Alberto156.167.778-70http://lattes.cnpq.br/5678667205895840Ynoguti, Carlos Alberto156.167.778-70http://lattes.cnpq.br/5678667205895840Guimar?es, Dayan Adionel739.337.836-15http://lattes.cnpq.br/2503439503631682Minami, M?riohttp://lattes.cnpq.br/5882877274227409052.866.756-46http://lattes.cnpq.br/6289204315531991Martins, Ramon Mayor2016-06-27T18:30:31Z2014-10-09Martins, Ramon Mayor. An?lise comparativa entre os m?todos HMM e GMM-UBM na busca pelo a-?timo dos locutores crian?as utilizando a t?cnica VTLN. 2014. [60]. Disserta????o( Programa 1) - Instituto Nacional de Telecomunicacoes, [Santa Rita do Sapuca?] .http://tede.inatel.br:8080/tede/handle/tede/23The aim of this work is to find means to minimize the high error rate found in speech recognition systems which are trained on adult speakers and tested on children speakers. In this regard, we propose the use of the GMM-UBM method as an alternative to the HMM method to find the optimal warping factor (?-optimal) for children speakers when the speaker normalization technique is used. The adopted normalization technique was VTLN, which normalizes the vocal tract of different children speakers through the use of mel filterbank frequency warping. The assessment of this technique also aimed to find the optimal mixture quantity that improves the system performance. Thus, the error rate in the system trained with adults and tested on children was reduced from 4,95% to 1,88% when VTLN was used with ?-optimals found by HMM and to 1,92% when VTLN was used with ?-optimals found by GMM. It was noticed that the application of VTLN technique using ?-optimals found by GMM-UBM method achieved a similar performance to HMM in the experiments. From the experiments it was observed that choosing GMM-UBM method turns to be more suitable due to its implementation simplicity and to the need of lower computational cost, being thus an alternative to HMM in the use of VTLN in Speech Recognition Systems for children speakers.Nesta disserta??o s?o abordadas formas de minimizar a alta taxa de erros em sistemas de reconhecimento de fala treinados com locutores adultos e testado com locutores crian?as. Prop?e-se a utiliza??o do m?todo GMM-UBM como alternativa ao m?todo HMM na busca pelo fator ?timo de escalonamento (?-?timo) para locutores crian?as quando utilizada a t?cnica de normaliza??o de locutor. A t?cnica de normaliza??o adotada ? a VTLN, que normaliza o trato vocal dos diferentes locutores crian?as atrav?s do escalonamento de frequ?ncias do banco de filtros mel. Na avalia??o desta t?cnica, procurou-se tamb?m a quantidade de misturas ?timas que melhoram o desempenho do sistema. Desse modo, reduziu-se a taxa de erro no sistema treinado com adultos e testado com crian?as de 4,95% para 1,88% quando utilizado a VTLN com os ?-?timos encontrados pelo HMM e 1,92 % quando utilizado a VTLN com os ?-?timos encontrados pelo GMM-UBM. Observou-se que a aplica??o da t?cnica VTLN utilizando os ?-?timos pelo m?todo GMM-UBM obteve desempenho similar ao HMM nos experimentos. Nos experimentos realizados concluiu-se que a escolha do m?todo GMM-UBM se torna mais adequada em virtude da simplicidade de implementa??o e necessidade de menor custo computacional, sendo assim uma alternativa ao HMM para realizar VTLN em sistemas de reconhecimento de fala para usu?rios crian?as.Submitted by Tede Dspace (tede@inatel.br) on 2016-06-27T18:30:31Z No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Dissertac?a?o V.Final Ramon Mayor Martins.pdf: 1957448 bytes, checksum: e21cd6acb902d52fc69d00903b5b1b33 (MD5)Made available in DSpace on 2016-06-27T18:30:31Z (GMT). No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Dissertac?a?o V.Final Ramon Mayor Martins.pdf: 1957448 bytes, checksum: e21cd6acb902d52fc69d00903b5b1b33 (MD5) Previous issue date: 2014-10-09application/pdfhttp://tede.inatel.br:8080/jspui/retrieve/303/Dissertac%cc%a7a%cc%83o%20V.Final%20Ramon%20Mayor%20Martins.pdf.jpgporInstituto Nacional de Telecomunica??esMestrado em Engenharia de Telecomunica??esINATELBrasilInstituto Nacional de Telecomunica??eshttp://creativecommons.org/licenses/by-nd/4.0/info:eu-repo/semantics/openAccessNormaliza??o de locutor; sistema de reconhecimento de fala; Modelos Ocultos de Markov; Modelos de Mistura Gaussiana; VTLNEngenharia - Telecomunica??esAn?lise comparativa entre os m?todos HMM e GMM-UBM na busca pelo a-?timo dos locutores crian?as utilizando a t?cnica VTLNinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisreponame:Biblioteca Digital de Teses e Dissertações da INATELinstname:Instituto Nacional de Telecomunicações (INATEL)instacron:INATELLICENSElicense.txtlicense.txttext/plain; charset=utf-8112http://localhost:8080/tede/bitstream/tede/23/1/license.txtc6279291b293f0db82678eaa73a27769MD51CC-LICENSElicense_urllicense_urltext/plain; charset=utf-846http://localhost:8080/tede/bitstream/tede/23/2/license_url587cd8ffae15c8598ed3c46d248a3f38MD52license_textlicense_texttext/html; charset=utf-80http://localhost:8080/tede/bitstream/tede/23/3/license_textd41d8cd98f00b204e9800998ecf8427eMD53license_rdflicense_rdfapplication/rdf+xml; charset=utf-80http://localhost:8080/tede/bitstream/tede/23/4/license_rdfd41d8cd98f00b204e9800998ecf8427eMD54ORIGINALDissertac?a?o V.Final Ramon Mayor Martins.pdfDissertac?a?o V.Final Ramon Mayor Martins.pdfapplication/pdf1957448http://localhost:8080/tede/bitstream/tede/23/5/Dissertac%CC%A7a%CC%83o+V.Final+Ramon+Mayor+Martins.pdfe21cd6acb902d52fc69d00903b5b1b33MD55TEXTDissertac?a?o V.Final Ramon Mayor Martins.pdf.txtDissertac?a?o V.Final Ramon Mayor Martins.pdf.txttext/plain96524http://localhost:8080/tede/bitstream/tede/23/6/Dissertac%CC%A7a%CC%83o+V.Final+Ramon+Mayor+Martins.pdf.txta2cbce108e33bf2eb51ad1eb1a9641f4MD56THUMBNAILDissertac?a?o V.Final Ramon Mayor Martins.pdf.jpgDissertac?a?o V.Final Ramon Mayor Martins.pdf.jpgimage/jpeg3682http://localhost:8080/tede/bitstream/tede/23/7/Dissertac%CC%A7a%CC%83o+V.Final+Ramon+Mayor+Martins.pdf.jpg6ec32d36c3ca40c02ec2ffd33bff8575MD57tede/232018-04-16 16:18:28.821oai:localhost:tede/23QXV0b3Jpem8gYSBwdWJsaWNhPz9vIGRhIG1pbmhhIERpc3NlcnRhPz9vIGRlIE1lc3RyYWRvLCBlbSBmb3JtYXRvIFBERiwgY29tIGJsb3F1ZWlvIGRlIGVkaT8/bywgY29sYWdlbSBlIGM/cGlhLg==Biblioteca Digital de Teses e Dissertaçõeshttp://tede.inatel.br:8080/jspui/PUBhttp://tede.inatel.br:8080/oai/requestbiblioteca@inatel.br || biblioteca.atendimento@inatel.bropendoar:2018-04-16T19:18:28Biblioteca Digital de Teses e Dissertações da INATEL - Instituto Nacional de Telecomunicações (INATEL)false |
dc.title.por.fl_str_mv |
An?lise comparativa entre os m?todos HMM e GMM-UBM na busca pelo a-?timo dos locutores crian?as utilizando a t?cnica VTLN |
title |
An?lise comparativa entre os m?todos HMM e GMM-UBM na busca pelo a-?timo dos locutores crian?as utilizando a t?cnica VTLN |
spellingShingle |
An?lise comparativa entre os m?todos HMM e GMM-UBM na busca pelo a-?timo dos locutores crian?as utilizando a t?cnica VTLN Martins, Ramon Mayor Normaliza??o de locutor; sistema de reconhecimento de fala; Modelos Ocultos de Markov; Modelos de Mistura Gaussiana; VTLN Engenharia - Telecomunica??es |
title_short |
An?lise comparativa entre os m?todos HMM e GMM-UBM na busca pelo a-?timo dos locutores crian?as utilizando a t?cnica VTLN |
title_full |
An?lise comparativa entre os m?todos HMM e GMM-UBM na busca pelo a-?timo dos locutores crian?as utilizando a t?cnica VTLN |
title_fullStr |
An?lise comparativa entre os m?todos HMM e GMM-UBM na busca pelo a-?timo dos locutores crian?as utilizando a t?cnica VTLN |
title_full_unstemmed |
An?lise comparativa entre os m?todos HMM e GMM-UBM na busca pelo a-?timo dos locutores crian?as utilizando a t?cnica VTLN |
title_sort |
An?lise comparativa entre os m?todos HMM e GMM-UBM na busca pelo a-?timo dos locutores crian?as utilizando a t?cnica VTLN |
author |
Martins, Ramon Mayor |
author_facet |
Martins, Ramon Mayor |
author_role |
author |
dc.contributor.advisor1.fl_str_mv |
Ynoguti, Carlos Alberto |
dc.contributor.advisor1ID.fl_str_mv |
156.167.778-70 |
dc.contributor.advisor1Lattes.fl_str_mv |
http://lattes.cnpq.br/5678667205895840 |
dc.contributor.referee1.fl_str_mv |
Ynoguti, Carlos Alberto |
dc.contributor.referee1ID.fl_str_mv |
156.167.778-70 |
dc.contributor.referee1Lattes.fl_str_mv |
http://lattes.cnpq.br/5678667205895840 |
dc.contributor.referee2.fl_str_mv |
Guimar?es, Dayan Adionel |
dc.contributor.referee2ID.fl_str_mv |
739.337.836-15 |
dc.contributor.referee2Lattes.fl_str_mv |
http://lattes.cnpq.br/2503439503631682 |
dc.contributor.referee3.fl_str_mv |
Minami, M?rio |
dc.contributor.referee3Lattes.fl_str_mv |
http://lattes.cnpq.br/5882877274227409 |
dc.contributor.authorID.fl_str_mv |
052.866.756-46 |
dc.contributor.authorLattes.fl_str_mv |
http://lattes.cnpq.br/6289204315531991 |
dc.contributor.author.fl_str_mv |
Martins, Ramon Mayor |
contributor_str_mv |
Ynoguti, Carlos Alberto Ynoguti, Carlos Alberto Guimar?es, Dayan Adionel Minami, M?rio |
dc.subject.por.fl_str_mv |
Normaliza??o de locutor; sistema de reconhecimento de fala; Modelos Ocultos de Markov; Modelos de Mistura Gaussiana; VTLN |
topic |
Normaliza??o de locutor; sistema de reconhecimento de fala; Modelos Ocultos de Markov; Modelos de Mistura Gaussiana; VTLN Engenharia - Telecomunica??es |
dc.subject.cnpq.fl_str_mv |
Engenharia - Telecomunica??es |
description |
The aim of this work is to find means to minimize the high error rate found in speech recognition systems which are trained on adult speakers and tested on children speakers. In this regard, we propose the use of the GMM-UBM method as an alternative to the HMM method to find the optimal warping factor (?-optimal) for children speakers when the speaker normalization technique is used. The adopted normalization technique was VTLN, which normalizes the vocal tract of different children speakers through the use of mel filterbank frequency warping. The assessment of this technique also aimed to find the optimal mixture quantity that improves the system performance. Thus, the error rate in the system trained with adults and tested on children was reduced from 4,95% to 1,88% when VTLN was used with ?-optimals found by HMM and to 1,92% when VTLN was used with ?-optimals found by GMM. It was noticed that the application of VTLN technique using ?-optimals found by GMM-UBM method achieved a similar performance to HMM in the experiments. From the experiments it was observed that choosing GMM-UBM method turns to be more suitable due to its implementation simplicity and to the need of lower computational cost, being thus an alternative to HMM in the use of VTLN in Speech Recognition Systems for children speakers. |
publishDate |
2014 |
dc.date.issued.fl_str_mv |
2014-10-09 |
dc.date.accessioned.fl_str_mv |
2016-06-27T18:30:31Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.citation.fl_str_mv |
Martins, Ramon Mayor. An?lise comparativa entre os m?todos HMM e GMM-UBM na busca pelo a-?timo dos locutores crian?as utilizando a t?cnica VTLN. 2014. [60]. Disserta????o( Programa 1) - Instituto Nacional de Telecomunicacoes, [Santa Rita do Sapuca?] . |
dc.identifier.uri.fl_str_mv |
http://tede.inatel.br:8080/tede/handle/tede/23 |
identifier_str_mv |
Martins, Ramon Mayor. An?lise comparativa entre os m?todos HMM e GMM-UBM na busca pelo a-?timo dos locutores crian?as utilizando a t?cnica VTLN. 2014. [60]. Disserta????o( Programa 1) - Instituto Nacional de Telecomunicacoes, [Santa Rita do Sapuca?] . |
url |
http://tede.inatel.br:8080/tede/handle/tede/23 |
dc.language.iso.fl_str_mv |
por |
language |
por |
dc.rights.driver.fl_str_mv |
http://creativecommons.org/licenses/by-nd/4.0/ info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
http://creativecommons.org/licenses/by-nd/4.0/ |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Instituto Nacional de Telecomunica??es |
dc.publisher.program.fl_str_mv |
Mestrado em Engenharia de Telecomunica??es |
dc.publisher.initials.fl_str_mv |
INATEL |
dc.publisher.country.fl_str_mv |
Brasil |
dc.publisher.department.fl_str_mv |
Instituto Nacional de Telecomunica??es |
publisher.none.fl_str_mv |
Instituto Nacional de Telecomunica??es |
dc.source.none.fl_str_mv |
reponame:Biblioteca Digital de Teses e Dissertações da INATEL instname:Instituto Nacional de Telecomunicações (INATEL) instacron:INATEL |
instname_str |
Instituto Nacional de Telecomunicações (INATEL) |
instacron_str |
INATEL |
institution |
INATEL |
reponame_str |
Biblioteca Digital de Teses e Dissertações da INATEL |
collection |
Biblioteca Digital de Teses e Dissertações da INATEL |
bitstream.url.fl_str_mv |
http://localhost:8080/tede/bitstream/tede/23/1/license.txt http://localhost:8080/tede/bitstream/tede/23/2/license_url http://localhost:8080/tede/bitstream/tede/23/3/license_text http://localhost:8080/tede/bitstream/tede/23/4/license_rdf http://localhost:8080/tede/bitstream/tede/23/5/Dissertac%CC%A7a%CC%83o+V.Final+Ramon+Mayor+Martins.pdf http://localhost:8080/tede/bitstream/tede/23/6/Dissertac%CC%A7a%CC%83o+V.Final+Ramon+Mayor+Martins.pdf.txt http://localhost:8080/tede/bitstream/tede/23/7/Dissertac%CC%A7a%CC%83o+V.Final+Ramon+Mayor+Martins.pdf.jpg |
bitstream.checksum.fl_str_mv |
c6279291b293f0db82678eaa73a27769 587cd8ffae15c8598ed3c46d248a3f38 d41d8cd98f00b204e9800998ecf8427e d41d8cd98f00b204e9800998ecf8427e e21cd6acb902d52fc69d00903b5b1b33 a2cbce108e33bf2eb51ad1eb1a9641f4 6ec32d36c3ca40c02ec2ffd33bff8575 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 MD5 MD5 MD5 MD5 |
repository.name.fl_str_mv |
Biblioteca Digital de Teses e Dissertações da INATEL - Instituto Nacional de Telecomunicações (INATEL) |
repository.mail.fl_str_mv |
biblioteca@inatel.br || biblioteca.atendimento@inatel.br |
_version_ |
1800214190323924992 |