Reconhecimento de fonemas com compactação das frequências via centroide e redes stacked autoencoders.
Autor(a) principal: | |
---|---|
Data de Publicação: | 2024 |
Tipo de documento: | Dissertação |
Idioma: | por |
Título da fonte: | Biblioteca Digital de Teses e Dissertações da UFMA |
Texto Completo: | https://tedebc.ufma.br/jspui/handle/tede/tede/5486 |
Resumo: | Phoneme recognition is an area of linguistics and speech processing that involves identifying and distinguishing the distinctive sounds that make up a language. Recognizing phonemes involves the ability to discern and categorize the different sounds of speech, even when there are variations in pronunciation, context or intonation. In this work, a phoneme recognition model is proposed using a stacked autoencoder network, called CollabNet. CollabNet introduces a collaborative method for inserting new hidden layers, in contrast to the traditional stacking of autoencoders. In CollabNet, the addition of a new layer is done in a coordinated and gradual manner, allowing the designer to control its influence on the training. This collaboration ensures that the learning of the new layer is effectively integrated with the previous layers, resulting in more aligned and efficient training. To represent the phonemes, the frequencies were compacted using centroids so as to preserve the particularities of the sound. In order to create a geometric representation of the audios in the databases, the fast Fourier transform (FFT) was calculated for each audio sample, then the frequencies were grouped and the centroid of each group was calculated. Subsequently, the deep stacked autoencoder network was parameterized and trained to recognize phonetic syllables. With this representation of the audios, one could maintain their particular characterization so that CollabNet could identify the various sounds of the Brazilian Portuguese language, thus achieving an accuracy of 75.96% and a PER of 23.73%. |
id |
UFMA_5958d963393faa831ae635884e528a57 |
---|---|
oai_identifier_str |
oai:tede2:tede/5486 |
network_acronym_str |
UFMA |
network_name_str |
Biblioteca Digital de Teses e Dissertações da UFMA |
repository_id_str |
2131 |
spelling |
ALMEIDA NETO, Areolino dehttp://lattes.cnpq.br/8041675571955870ALMEIDA NETO, Areolino dehttp://lattes.cnpq.br/8041675571955870OLIVEIRA, Alexandre César Muniz dehttp://lattes.cnpq.br/5225588855422632SAMPAIO NETO, Nelson Cruzhttp://lattes.cnpq.br/9756167788721062http://lattes.cnpq.br/0100453417772333PEREIRA , Bianca Valéria Lopes2024-09-02T19:42:44Z2024-06-06PEREIRA , Bianca Valéria Lopes. Reconhecimento de fonemas com compactação das frequências via centroide e redes stacked autoencoders. 2024.115 f. Dissertação (Programa de Pós-Graduação em Ciência da Computação/CCET) - Universidade Federal do Maranhão, São Luís, 2024.https://tedebc.ufma.br/jspui/handle/tede/tede/5486Phoneme recognition is an area of linguistics and speech processing that involves identifying and distinguishing the distinctive sounds that make up a language. Recognizing phonemes involves the ability to discern and categorize the different sounds of speech, even when there are variations in pronunciation, context or intonation. In this work, a phoneme recognition model is proposed using a stacked autoencoder network, called CollabNet. CollabNet introduces a collaborative method for inserting new hidden layers, in contrast to the traditional stacking of autoencoders. In CollabNet, the addition of a new layer is done in a coordinated and gradual manner, allowing the designer to control its influence on the training. This collaboration ensures that the learning of the new layer is effectively integrated with the previous layers, resulting in more aligned and efficient training. To represent the phonemes, the frequencies were compacted using centroids so as to preserve the particularities of the sound. In order to create a geometric representation of the audios in the databases, the fast Fourier transform (FFT) was calculated for each audio sample, then the frequencies were grouped and the centroid of each group was calculated. Subsequently, the deep stacked autoencoder network was parameterized and trained to recognize phonetic syllables. With this representation of the audios, one could maintain their particular characterization so that CollabNet could identify the various sounds of the Brazilian Portuguese language, thus achieving an accuracy of 75.96% and a PER of 23.73%.Oreconhecimento de fonemas é uma área da linguística e processamento de fala que envolve identificar e distinguir os sons distintivos que compõem uma língua. Reconhecer fonemas envolve a capacidade de discernir e categorizar os diferentes sons da fala, mesmo quando há variações de pronúncia, contexto ou entonação. Neste trabalho, é proposto um modelo de reconhecimento de fonemas utilizando uma rede stacked autoencoder, denominada CollabNet. A CollabNet introduz um método colaborativo para inserção de novas camadas escondidas, em contraste com o tradicional empilhamento de autoencoders. Na CollabNet, a adição de uma nova camada é feita de forma coordenada e gradual, permitindo ao projetista controlar sua influência no treinamento. Essa colaboração garante que o aprendizado da nova camada se integre de forma eficaz com as camadas anteriores, resultando em um treinamento mais alinhado e eficiente. Para a representação dos fonemas, foi realizada a compactação das frequências por meio de centroides, de maneira que se preserve as particularidades do som. Com o objetivo de criar uma representação geométrica dos áudios das bases de dados, foi calculada a transformada rápida de Fourier (FFT) para cada amostra de áudio, em seguida foram agrupadas as frequências e foi calculado o centroide de cada grupo. Posteriormente, a rede deep stacked autoencoder foi parametrizada e treinada para o reconhecimento de sílabas fonemas. Com essa representação dos áudios, foi possível manter sua caracterização particular de maneira que a CollabNet identificasse os diversos sons da língua portuguesa do Brasil, tendo assim uma acurácia de 75,96% e PER de 23,73%.Submitted by Daniella Santos (daniella.santos@ufma.br) on 2024-09-02T19:42:44Z No. of bitstreams: 1 BIANCAVALÉRIALOPESPEREIRA.pdf: 13272167 bytes, checksum: b750351cfdf854d9c1cd1c4f95ddcc27 (MD5)Made available in DSpace on 2024-09-02T19:42:44Z (GMT). No. of bitstreams: 1 BIANCAVALÉRIALOPESPEREIRA.pdf: 13272167 bytes, checksum: b750351cfdf854d9c1cd1c4f95ddcc27 (MD5) Previous issue date: 2024-06-06CAPESapplication/pdfporUniversidade Federal do MaranhãoPROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO/CCETUFMABrasilDEPARTAMENTO DE INFORMÁTICA/CCETreconhecimento de fonemas;coeficientes de compactação;stacked autoenco ders;phoneme recognition;compaction coefficients;stacked autoencoders.Ciência da ComputaçãoReconhecimento de fonemas com compactação das frequências via centroide e redes stacked autoencoders.Phoneme recognition with frequency compression via centroid and stacked autoencoder networks.info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisinfo:eu-repo/semantics/openAccessreponame:Biblioteca Digital de Teses e Dissertações da UFMAinstname:Universidade Federal do Maranhão (UFMA)instacron:UFMAORIGINALBIANCAVALÉRIALOPESPEREIRA.pdfBIANCAVALÉRIALOPESPEREIRA.pdfapplication/pdf13272167http://tedebc.ufma.br:8080/bitstream/tede/5486/2/BIANCAVAL%C3%89RIALOPESPEREIRA.pdfb750351cfdf854d9c1cd1c4f95ddcc27MD52LICENSElicense.txtlicense.txttext/plain; charset=utf-82255http://tedebc.ufma.br:8080/bitstream/tede/5486/1/license.txt97eeade1fce43278e63fe063657f8083MD51tede/54862024-09-02 16:42:44.112oai:tede2:tede/5486IExJQ0VOw4dBIERFIERJU1RSSUJVScOHw4NPIE7Dg08tRVhDTFVTSVZBCgpDb20gYSBhcHJlc2VudGHDp8OjbyBkZXN0YSBsaWNlbsOnYSxvIGF1dG9yIChlcykgb3UgbyB0aXR1bGFyIGRvcyBkaXJlaXRvcyBkZSBhdXRvciBjb25jZWRlIMOgIFVuaXZlcnNpZGFkZSBGZWRlcmFsIGRvIE1hcmFuaMOjbyAoVUZNQSkgbyBkaXJlaXRvIG7Do28tZXhjbHVzaXZvIGRlIHJlcHJvZHV6aXIsIHRyYWR1emlyIChjb25mb3JtZSBkZWZpbmlkbyBhYmFpeG8pLCBlL291IGRpc3RyaWJ1aXIgYSBzdWEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvIChpbmNsdWluZG8gbyByZXN1bW8pIHBvciB0b2RvIG8gbXVuZG8gbm8gZm9ybWF0byBpbXByZXNzbyBlIGVsZXRyw7RuaWNvIGUgZW0gcXVhbHF1ZXIgbWVpbywgaW5jbHVpbmRvIG9zIGZvcm1hdG9zIMOhdWRpbyBvdSB2w61kZW8uCgpWb2PDqiBjb25jb3JkYSBxdWUgYSBVRk1BIHBvZGUsIHNlbSBhbHRlcmFyIG8gY29udGXDumRvLCB0cmFuc3BvciBhIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28gcGFyYSBxdWFscXVlciBtZWlvIG91IGZvcm1hdG8gcGFyYSBmaW5zIGRlIHByZXNlcnZhw6fDo28uCgpWb2PDqiB0YW1iw6ltIGNvbmNvcmRhIHF1ZSBhIFVGTUEgcG9kZSBtYW50ZXIgbWFpcyBkZSB1bWEgY8OzcGlhIGRlIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28gcGFyYSBmaW5zIGRlIHNlZ3VyYW7Dp2EsIGJhY2stdXAgZSBwcmVzZXJ2YcOnw6NvLgoKVm9jw6ogZGVjbGFyYSBxdWUgYSBzdWEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvIMOpIG9yaWdpbmFsIGUgcXVlIHZvY8OqIHRlbSBvIHBvZGVyIGRlIGNvbmNlZGVyIG9zIGRpcmVpdG9zIGNvbnRpZG9zIG5lc3RhIGxpY2Vuw6dhLiBWb2PDqiB0YW1iw6ltIGRlY2xhcmEgcXVlIG8gZGVww7NzaXRvIGRhIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28gbsOjbywgcXVlIHNlamEgZGUgc2V1IGNvbmhlY2ltZW50bywgaW5mcmluZ2UgZGlyZWl0b3MgYXV0b3JhaXMgZGUgbmluZ3XDqW0uCgpDYXNvIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyBjb250ZW5oYSBtYXRlcmlhbCBxdWUgdm9jw6ogbsOjbyBwb3NzdWkgYSB0aXR1bGFyaWRhZGUgZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCB2b2PDqiBkZWNsYXJhIHF1ZSBvYnRldmUgYSBwZXJtaXNzw6NvIGlycmVzdHJpdGEgZG8gZGV0ZW50b3IgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIHBhcmEgY29uY2VkZXIgw6AgVUZNQSBvcyBkaXJlaXRvcyBhcHJlc2VudGFkb3MgbmVzdGEgbGljZW7Dp2EsIGUgcXVlIGVzc2UgbWF0ZXJpYWwgZGUgcHJvcHJpZWRhZGUgZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUgaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3Ugbm8gY29udGXDumRvIGRhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyBvcmEgZGVwb3NpdGFkYS4KCkNBU08gQSBURVNFIE9VIERJU1NFUlRBw4fDg08gT1JBIERFUE9TSVRBREEgVEVOSEEgU0lETyBSRVNVTFRBRE8gREUgVU0gUEFUUk9Dw41OSU8gT1UgQVBPSU8gREUgVU1BIEFHw4pOQ0lBIERFIEZPTUVOVE8gT1UgT1VUUk8gT1JHQU5JU01PIFFVRSBOw4NPIFNFSkEgQSBVRk1BLCBWT0PDiiBERUNMQVJBIFFVRSBSRVNQRUlUT1UgVE9ET1MgRSBRVUFJU1FVRVIgRElSRUlUT1MgREUgUkVWSVPDg08gQ09NTyBUQU1Cw4lNIEFTIERFTUFJUyBPQlJJR0HDh8OVRVMgRVhJR0lEQVMgUE9SIENPTlRSQVRPIE9VIEFDT1JETy4KCkEgVUZNQSBzZSBjb21wcm9tZXRlIGEgaWRlbnRpZmljYXIgY2xhcmFtZW50ZSBvIHNldSBub21lIG91IG8ocykgbm9tZShzKSBkbyhzKSBkZXRlbnRvcihlcykgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIGRhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbywgZSBuw6NvIGZhcsOhIHF1YWxxdWVyIGFsdGVyYcOnw6NvLCBhbMOpbSBkYXF1ZWxhcyBjb25jZWRpZGFzIHBvciBlc3RhIGxpY2Vuw6dhLgoKRGVjbGFyYSB0YW1iw6ltIHF1ZSB0b2RhcyBhcyBhZmlsaWHDp8O1ZXMgY29ycG9yYXRpdmFzIG91IGluc3RpdHVjaW9uYWlzIGUgdG9kYXMgYXMgZm9udGVzIGRlIGFwb2lvIGZpbmFuY2Vpcm8gYW8gdHJhYmFsaG8gZXN0w6NvIGRldmlkYW1lbnRlIGNpdGFkYXMgb3UgbWVuY2lvbmFkYXMgZSBjZXJ0aWZpY2EgcXVlIG7Do28gaMOhIG5lbmh1bSBpbnRlcmVzc2UgY29tZXJjaWFsIG91IGFzc29jaWF0aXZvIHF1ZSByZXByZXNlbnRlIGNvbmZsaXRvIGRlIGludGVyZXNzZSBlbSBjb25leMOjbyBjb20gbyB0cmFiYWxobyBzdWJtZXRpZG8uCgoKCgoKCgo=Biblioteca Digital de Teses e Dissertaçõeshttps://tedebc.ufma.br/jspui/PUBhttp://tedebc.ufma.br:8080/oai/requestrepositorio@ufma.br||repositorio@ufma.bropendoar:21312024-09-02T19:42:44Biblioteca Digital de Teses e Dissertações da UFMA - Universidade Federal do Maranhão (UFMA)false |
dc.title.por.fl_str_mv |
Reconhecimento de fonemas com compactação das frequências via centroide e redes stacked autoencoders. |
dc.title.alternative.eng.fl_str_mv |
Phoneme recognition with frequency compression via centroid and stacked autoencoder networks. |
title |
Reconhecimento de fonemas com compactação das frequências via centroide e redes stacked autoencoders. |
spellingShingle |
Reconhecimento de fonemas com compactação das frequências via centroide e redes stacked autoencoders. PEREIRA , Bianca Valéria Lopes reconhecimento de fonemas; coeficientes de compactação; stacked autoenco ders; phoneme recognition; compaction coefficients; stacked autoencoders. Ciência da Computação |
title_short |
Reconhecimento de fonemas com compactação das frequências via centroide e redes stacked autoencoders. |
title_full |
Reconhecimento de fonemas com compactação das frequências via centroide e redes stacked autoencoders. |
title_fullStr |
Reconhecimento de fonemas com compactação das frequências via centroide e redes stacked autoencoders. |
title_full_unstemmed |
Reconhecimento de fonemas com compactação das frequências via centroide e redes stacked autoencoders. |
title_sort |
Reconhecimento de fonemas com compactação das frequências via centroide e redes stacked autoencoders. |
author |
PEREIRA , Bianca Valéria Lopes |
author_facet |
PEREIRA , Bianca Valéria Lopes |
author_role |
author |
dc.contributor.advisor1.fl_str_mv |
ALMEIDA NETO, Areolino de |
dc.contributor.advisor1Lattes.fl_str_mv |
http://lattes.cnpq.br/8041675571955870 |
dc.contributor.referee1.fl_str_mv |
ALMEIDA NETO, Areolino de |
dc.contributor.referee1Lattes.fl_str_mv |
http://lattes.cnpq.br/8041675571955870 |
dc.contributor.referee2.fl_str_mv |
OLIVEIRA, Alexandre César Muniz de |
dc.contributor.referee2Lattes.fl_str_mv |
http://lattes.cnpq.br/5225588855422632 |
dc.contributor.referee3.fl_str_mv |
SAMPAIO NETO, Nelson Cruz |
dc.contributor.referee3Lattes.fl_str_mv |
http://lattes.cnpq.br/9756167788721062 |
dc.contributor.authorLattes.fl_str_mv |
http://lattes.cnpq.br/0100453417772333 |
dc.contributor.author.fl_str_mv |
PEREIRA , Bianca Valéria Lopes |
contributor_str_mv |
ALMEIDA NETO, Areolino de ALMEIDA NETO, Areolino de OLIVEIRA, Alexandre César Muniz de SAMPAIO NETO, Nelson Cruz |
dc.subject.por.fl_str_mv |
reconhecimento de fonemas; coeficientes de compactação; stacked autoenco ders; |
topic |
reconhecimento de fonemas; coeficientes de compactação; stacked autoenco ders; phoneme recognition; compaction coefficients; stacked autoencoders. Ciência da Computação |
dc.subject.eng.fl_str_mv |
phoneme recognition; compaction coefficients; stacked autoencoders. |
dc.subject.cnpq.fl_str_mv |
Ciência da Computação |
description |
Phoneme recognition is an area of linguistics and speech processing that involves identifying and distinguishing the distinctive sounds that make up a language. Recognizing phonemes involves the ability to discern and categorize the different sounds of speech, even when there are variations in pronunciation, context or intonation. In this work, a phoneme recognition model is proposed using a stacked autoencoder network, called CollabNet. CollabNet introduces a collaborative method for inserting new hidden layers, in contrast to the traditional stacking of autoencoders. In CollabNet, the addition of a new layer is done in a coordinated and gradual manner, allowing the designer to control its influence on the training. This collaboration ensures that the learning of the new layer is effectively integrated with the previous layers, resulting in more aligned and efficient training. To represent the phonemes, the frequencies were compacted using centroids so as to preserve the particularities of the sound. In order to create a geometric representation of the audios in the databases, the fast Fourier transform (FFT) was calculated for each audio sample, then the frequencies were grouped and the centroid of each group was calculated. Subsequently, the deep stacked autoencoder network was parameterized and trained to recognize phonetic syllables. With this representation of the audios, one could maintain their particular characterization so that CollabNet could identify the various sounds of the Brazilian Portuguese language, thus achieving an accuracy of 75.96% and a PER of 23.73%. |
publishDate |
2024 |
dc.date.accessioned.fl_str_mv |
2024-09-02T19:42:44Z |
dc.date.issued.fl_str_mv |
2024-06-06 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.citation.fl_str_mv |
PEREIRA , Bianca Valéria Lopes. Reconhecimento de fonemas com compactação das frequências via centroide e redes stacked autoencoders. 2024.115 f. Dissertação (Programa de Pós-Graduação em Ciência da Computação/CCET) - Universidade Federal do Maranhão, São Luís, 2024. |
dc.identifier.uri.fl_str_mv |
https://tedebc.ufma.br/jspui/handle/tede/tede/5486 |
identifier_str_mv |
PEREIRA , Bianca Valéria Lopes. Reconhecimento de fonemas com compactação das frequências via centroide e redes stacked autoencoders. 2024.115 f. Dissertação (Programa de Pós-Graduação em Ciência da Computação/CCET) - Universidade Federal do Maranhão, São Luís, 2024. |
url |
https://tedebc.ufma.br/jspui/handle/tede/tede/5486 |
dc.language.iso.fl_str_mv |
por |
language |
por |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Universidade Federal do Maranhão |
dc.publisher.program.fl_str_mv |
PROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO/CCET |
dc.publisher.initials.fl_str_mv |
UFMA |
dc.publisher.country.fl_str_mv |
Brasil |
dc.publisher.department.fl_str_mv |
DEPARTAMENTO DE INFORMÁTICA/CCET |
publisher.none.fl_str_mv |
Universidade Federal do Maranhão |
dc.source.none.fl_str_mv |
reponame:Biblioteca Digital de Teses e Dissertações da UFMA instname:Universidade Federal do Maranhão (UFMA) instacron:UFMA |
instname_str |
Universidade Federal do Maranhão (UFMA) |
instacron_str |
UFMA |
institution |
UFMA |
reponame_str |
Biblioteca Digital de Teses e Dissertações da UFMA |
collection |
Biblioteca Digital de Teses e Dissertações da UFMA |
bitstream.url.fl_str_mv |
http://tedebc.ufma.br:8080/bitstream/tede/5486/2/BIANCAVAL%C3%89RIALOPESPEREIRA.pdf http://tedebc.ufma.br:8080/bitstream/tede/5486/1/license.txt |
bitstream.checksum.fl_str_mv |
b750351cfdf854d9c1cd1c4f95ddcc27 97eeade1fce43278e63fe063657f8083 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 |
repository.name.fl_str_mv |
Biblioteca Digital de Teses e Dissertações da UFMA - Universidade Federal do Maranhão (UFMA) |
repository.mail.fl_str_mv |
repositorio@ufma.br||repositorio@ufma.br |
_version_ |
1809926182947258368 |