Reconhecimento de fonemas com compactação das frequências via centroide e redes stacked autoencoders.

Detalhes bibliográficos
Autor(a) principal: PEREIRA , Bianca Valéria Lopes
Data de Publicação: 2024
Tipo de documento: Dissertação
Idioma: por
Título da fonte: Biblioteca Digital de Teses e Dissertações da UFMA
Texto Completo: https://tedebc.ufma.br/jspui/handle/tede/tede/5486
Resumo: Phoneme recognition is an area of linguistics and speech processing that involves identifying and distinguishing the distinctive sounds that make up a language. Recognizing phonemes involves the ability to discern and categorize the different sounds of speech, even when there are variations in pronunciation, context or intonation. In this work, a phoneme recognition model is proposed using a stacked autoencoder network, called CollabNet. CollabNet introduces a collaborative method for inserting new hidden layers, in contrast to the traditional stacking of autoencoders. In CollabNet, the addition of a new layer is done in a coordinated and gradual manner, allowing the designer to control its influence on the training. This collaboration ensures that the learning of the new layer is effectively integrated with the previous layers, resulting in more aligned and efficient training. To represent the phonemes, the frequencies were compacted using centroids so as to preserve the particularities of the sound. In order to create a geometric representation of the audios in the databases, the fast Fourier transform (FFT) was calculated for each audio sample, then the frequencies were grouped and the centroid of each group was calculated. Subsequently, the deep stacked autoencoder network was parameterized and trained to recognize phonetic syllables. With this representation of the audios, one could maintain their particular characterization so that CollabNet could identify the various sounds of the Brazilian Portuguese language, thus achieving an accuracy of 75.96% and a PER of 23.73%.
id UFMA_5958d963393faa831ae635884e528a57
oai_identifier_str oai:tede2:tede/5486
network_acronym_str UFMA
network_name_str Biblioteca Digital de Teses e Dissertações da UFMA
repository_id_str 2131
spelling ALMEIDA NETO, Areolino dehttp://lattes.cnpq.br/8041675571955870ALMEIDA NETO, Areolino dehttp://lattes.cnpq.br/8041675571955870OLIVEIRA, Alexandre César Muniz dehttp://lattes.cnpq.br/5225588855422632SAMPAIO NETO, Nelson Cruzhttp://lattes.cnpq.br/9756167788721062http://lattes.cnpq.br/0100453417772333PEREIRA , Bianca Valéria Lopes2024-09-02T19:42:44Z2024-06-06PEREIRA , Bianca Valéria Lopes. Reconhecimento de fonemas com compactação das frequências via centroide e redes stacked autoencoders. 2024.115 f. Dissertação (Programa de Pós-Graduação em Ciência da Computação/CCET) - Universidade Federal do Maranhão, São Luís, 2024.https://tedebc.ufma.br/jspui/handle/tede/tede/5486Phoneme recognition is an area of linguistics and speech processing that involves identifying and distinguishing the distinctive sounds that make up a language. Recognizing phonemes involves the ability to discern and categorize the different sounds of speech, even when there are variations in pronunciation, context or intonation. In this work, a phoneme recognition model is proposed using a stacked autoencoder network, called CollabNet. CollabNet introduces a collaborative method for inserting new hidden layers, in contrast to the traditional stacking of autoencoders. In CollabNet, the addition of a new layer is done in a coordinated and gradual manner, allowing the designer to control its influence on the training. This collaboration ensures that the learning of the new layer is effectively integrated with the previous layers, resulting in more aligned and efficient training. To represent the phonemes, the frequencies were compacted using centroids so as to preserve the particularities of the sound. In order to create a geometric representation of the audios in the databases, the fast Fourier transform (FFT) was calculated for each audio sample, then the frequencies were grouped and the centroid of each group was calculated. Subsequently, the deep stacked autoencoder network was parameterized and trained to recognize phonetic syllables. With this representation of the audios, one could maintain their particular characterization so that CollabNet could identify the various sounds of the Brazilian Portuguese language, thus achieving an accuracy of 75.96% and a PER of 23.73%.Oreconhecimento de fonemas é uma área da linguística e processamento de fala que envolve identificar e distinguir os sons distintivos que compõem uma língua. Reconhecer fonemas envolve a capacidade de discernir e categorizar os diferentes sons da fala, mesmo quando há variações de pronúncia, contexto ou entonação. Neste trabalho, é proposto um modelo de reconhecimento de fonemas utilizando uma rede stacked autoencoder, denominada CollabNet. A CollabNet introduz um método colaborativo para inserção de novas camadas escondidas, em contraste com o tradicional empilhamento de autoencoders. Na CollabNet, a adição de uma nova camada é feita de forma coordenada e gradual, permitindo ao projetista controlar sua influência no treinamento. Essa colaboração garante que o aprendizado da nova camada se integre de forma eficaz com as camadas anteriores, resultando em um treinamento mais alinhado e eficiente. Para a representação dos fonemas, foi realizada a compactação das frequências por meio de centroides, de maneira que se preserve as particularidades do som. Com o objetivo de criar uma representação geométrica dos áudios das bases de dados, foi calculada a transformada rápida de Fourier (FFT) para cada amostra de áudio, em seguida foram agrupadas as frequências e foi calculado o centroide de cada grupo. Posteriormente, a rede deep stacked autoencoder foi parametrizada e treinada para o reconhecimento de sílabas fonemas. Com essa representação dos áudios, foi possível manter sua caracterização particular de maneira que a CollabNet identificasse os diversos sons da língua portuguesa do Brasil, tendo assim uma acurácia de 75,96% e PER de 23,73%.Submitted by Daniella Santos (daniella.santos@ufma.br) on 2024-09-02T19:42:44Z No. of bitstreams: 1 BIANCAVALÉRIALOPESPEREIRA.pdf: 13272167 bytes, checksum: b750351cfdf854d9c1cd1c4f95ddcc27 (MD5)Made available in DSpace on 2024-09-02T19:42:44Z (GMT). No. of bitstreams: 1 BIANCAVALÉRIALOPESPEREIRA.pdf: 13272167 bytes, checksum: b750351cfdf854d9c1cd1c4f95ddcc27 (MD5) Previous issue date: 2024-06-06CAPESapplication/pdfporUniversidade Federal do MaranhãoPROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO/CCETUFMABrasilDEPARTAMENTO DE INFORMÁTICA/CCETreconhecimento de fonemas;coeficientes de compactação;stacked autoenco ders;phoneme recognition;compaction coefficients;stacked autoencoders.Ciência da ComputaçãoReconhecimento de fonemas com compactação das frequências via centroide e redes stacked autoencoders.Phoneme recognition with frequency compression via centroid and stacked autoencoder networks.info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisinfo:eu-repo/semantics/openAccessreponame:Biblioteca Digital de Teses e Dissertações da UFMAinstname:Universidade Federal do Maranhão (UFMA)instacron:UFMAORIGINALBIANCAVALÉRIALOPESPEREIRA.pdfBIANCAVALÉRIALOPESPEREIRA.pdfapplication/pdf13272167http://tedebc.ufma.br:8080/bitstream/tede/5486/2/BIANCAVAL%C3%89RIALOPESPEREIRA.pdfb750351cfdf854d9c1cd1c4f95ddcc27MD52LICENSElicense.txtlicense.txttext/plain; charset=utf-82255http://tedebc.ufma.br:8080/bitstream/tede/5486/1/license.txt97eeade1fce43278e63fe063657f8083MD51tede/54862024-09-02 16:42:44.112oai:tede2:tede/5486IExJQ0VOw4dBIERFIERJU1RSSUJVScOHw4NPIE7Dg08tRVhDTFVTSVZBCgpDb20gYSBhcHJlc2VudGHDp8OjbyBkZXN0YSBsaWNlbsOnYSxvIGF1dG9yIChlcykgb3UgbyB0aXR1bGFyIGRvcyBkaXJlaXRvcyBkZSBhdXRvciBjb25jZWRlIMOgIFVuaXZlcnNpZGFkZSBGZWRlcmFsIGRvIE1hcmFuaMOjbyAoVUZNQSkgbyBkaXJlaXRvIG7Do28tZXhjbHVzaXZvIGRlIHJlcHJvZHV6aXIsIHRyYWR1emlyIChjb25mb3JtZSBkZWZpbmlkbyBhYmFpeG8pLCBlL291IGRpc3RyaWJ1aXIgYSBzdWEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvIChpbmNsdWluZG8gbyByZXN1bW8pIHBvciB0b2RvIG8gbXVuZG8gbm8gZm9ybWF0byBpbXByZXNzbyBlIGVsZXRyw7RuaWNvIGUgZW0gcXVhbHF1ZXIgbWVpbywgaW5jbHVpbmRvIG9zIGZvcm1hdG9zIMOhdWRpbyBvdSB2w61kZW8uCgpWb2PDqiBjb25jb3JkYSBxdWUgYSBVRk1BIHBvZGUsIHNlbSBhbHRlcmFyIG8gY29udGXDumRvLCB0cmFuc3BvciBhIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28gcGFyYSBxdWFscXVlciBtZWlvIG91IGZvcm1hdG8gcGFyYSBmaW5zIGRlIHByZXNlcnZhw6fDo28uCgpWb2PDqiB0YW1iw6ltIGNvbmNvcmRhIHF1ZSBhIFVGTUEgcG9kZSBtYW50ZXIgbWFpcyBkZSB1bWEgY8OzcGlhIGRlIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28gcGFyYSBmaW5zIGRlIHNlZ3VyYW7Dp2EsIGJhY2stdXAgZSBwcmVzZXJ2YcOnw6NvLgoKVm9jw6ogZGVjbGFyYSBxdWUgYSBzdWEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvIMOpIG9yaWdpbmFsIGUgcXVlIHZvY8OqIHRlbSBvIHBvZGVyIGRlIGNvbmNlZGVyIG9zIGRpcmVpdG9zIGNvbnRpZG9zIG5lc3RhIGxpY2Vuw6dhLiBWb2PDqiB0YW1iw6ltIGRlY2xhcmEgcXVlIG8gZGVww7NzaXRvIGRhIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28gbsOjbywgcXVlIHNlamEgZGUgc2V1IGNvbmhlY2ltZW50bywgaW5mcmluZ2UgZGlyZWl0b3MgYXV0b3JhaXMgZGUgbmluZ3XDqW0uCgpDYXNvIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyBjb250ZW5oYSBtYXRlcmlhbCBxdWUgdm9jw6ogbsOjbyBwb3NzdWkgYSB0aXR1bGFyaWRhZGUgZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCB2b2PDqiBkZWNsYXJhIHF1ZSBvYnRldmUgYSBwZXJtaXNzw6NvIGlycmVzdHJpdGEgZG8gZGV0ZW50b3IgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIHBhcmEgY29uY2VkZXIgw6AgVUZNQSBvcyBkaXJlaXRvcyBhcHJlc2VudGFkb3MgbmVzdGEgbGljZW7Dp2EsIGUgcXVlIGVzc2UgbWF0ZXJpYWwgZGUgcHJvcHJpZWRhZGUgZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUgaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3Ugbm8gY29udGXDumRvIGRhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyBvcmEgZGVwb3NpdGFkYS4KCkNBU08gQSBURVNFIE9VIERJU1NFUlRBw4fDg08gT1JBIERFUE9TSVRBREEgVEVOSEEgU0lETyBSRVNVTFRBRE8gREUgVU0gUEFUUk9Dw41OSU8gT1UgQVBPSU8gREUgVU1BIEFHw4pOQ0lBIERFIEZPTUVOVE8gT1UgT1VUUk8gT1JHQU5JU01PIFFVRSBOw4NPIFNFSkEgQSBVRk1BLCBWT0PDiiBERUNMQVJBIFFVRSBSRVNQRUlUT1UgVE9ET1MgRSBRVUFJU1FVRVIgRElSRUlUT1MgREUgUkVWSVPDg08gQ09NTyBUQU1Cw4lNIEFTIERFTUFJUyBPQlJJR0HDh8OVRVMgRVhJR0lEQVMgUE9SIENPTlRSQVRPIE9VIEFDT1JETy4KCkEgVUZNQSBzZSBjb21wcm9tZXRlIGEgaWRlbnRpZmljYXIgY2xhcmFtZW50ZSBvIHNldSBub21lIG91IG8ocykgbm9tZShzKSBkbyhzKSBkZXRlbnRvcihlcykgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIGRhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbywgZSBuw6NvIGZhcsOhIHF1YWxxdWVyIGFsdGVyYcOnw6NvLCBhbMOpbSBkYXF1ZWxhcyBjb25jZWRpZGFzIHBvciBlc3RhIGxpY2Vuw6dhLgoKRGVjbGFyYSB0YW1iw6ltIHF1ZSB0b2RhcyBhcyBhZmlsaWHDp8O1ZXMgY29ycG9yYXRpdmFzIG91IGluc3RpdHVjaW9uYWlzIGUgdG9kYXMgYXMgZm9udGVzIGRlIGFwb2lvIGZpbmFuY2Vpcm8gYW8gdHJhYmFsaG8gZXN0w6NvIGRldmlkYW1lbnRlIGNpdGFkYXMgb3UgbWVuY2lvbmFkYXMgZSBjZXJ0aWZpY2EgcXVlIG7Do28gaMOhIG5lbmh1bSBpbnRlcmVzc2UgY29tZXJjaWFsIG91IGFzc29jaWF0aXZvIHF1ZSByZXByZXNlbnRlIGNvbmZsaXRvIGRlIGludGVyZXNzZSBlbSBjb25leMOjbyBjb20gbyB0cmFiYWxobyBzdWJtZXRpZG8uCgoKCgoKCgo=Biblioteca Digital de Teses e Dissertaçõeshttps://tedebc.ufma.br/jspui/PUBhttp://tedebc.ufma.br:8080/oai/requestrepositorio@ufma.br||repositorio@ufma.bropendoar:21312024-09-02T19:42:44Biblioteca Digital de Teses e Dissertações da UFMA - Universidade Federal do Maranhão (UFMA)false
dc.title.por.fl_str_mv Reconhecimento de fonemas com compactação das frequências via centroide e redes stacked autoencoders.
dc.title.alternative.eng.fl_str_mv Phoneme recognition with frequency compression via centroid and stacked autoencoder networks.
title Reconhecimento de fonemas com compactação das frequências via centroide e redes stacked autoencoders.
spellingShingle Reconhecimento de fonemas com compactação das frequências via centroide e redes stacked autoencoders.
PEREIRA , Bianca Valéria Lopes
reconhecimento de fonemas;
coeficientes de compactação;
stacked autoenco ders;
phoneme recognition;
compaction coefficients;
stacked autoencoders.
Ciência da Computação
title_short Reconhecimento de fonemas com compactação das frequências via centroide e redes stacked autoencoders.
title_full Reconhecimento de fonemas com compactação das frequências via centroide e redes stacked autoencoders.
title_fullStr Reconhecimento de fonemas com compactação das frequências via centroide e redes stacked autoencoders.
title_full_unstemmed Reconhecimento de fonemas com compactação das frequências via centroide e redes stacked autoencoders.
title_sort Reconhecimento de fonemas com compactação das frequências via centroide e redes stacked autoencoders.
author PEREIRA , Bianca Valéria Lopes
author_facet PEREIRA , Bianca Valéria Lopes
author_role author
dc.contributor.advisor1.fl_str_mv ALMEIDA NETO, Areolino de
dc.contributor.advisor1Lattes.fl_str_mv http://lattes.cnpq.br/8041675571955870
dc.contributor.referee1.fl_str_mv ALMEIDA NETO, Areolino de
dc.contributor.referee1Lattes.fl_str_mv http://lattes.cnpq.br/8041675571955870
dc.contributor.referee2.fl_str_mv OLIVEIRA, Alexandre César Muniz de
dc.contributor.referee2Lattes.fl_str_mv http://lattes.cnpq.br/5225588855422632
dc.contributor.referee3.fl_str_mv SAMPAIO NETO, Nelson Cruz
dc.contributor.referee3Lattes.fl_str_mv http://lattes.cnpq.br/9756167788721062
dc.contributor.authorLattes.fl_str_mv http://lattes.cnpq.br/0100453417772333
dc.contributor.author.fl_str_mv PEREIRA , Bianca Valéria Lopes
contributor_str_mv ALMEIDA NETO, Areolino de
ALMEIDA NETO, Areolino de
OLIVEIRA, Alexandre César Muniz de
SAMPAIO NETO, Nelson Cruz
dc.subject.por.fl_str_mv reconhecimento de fonemas;
coeficientes de compactação;
stacked autoenco ders;
topic reconhecimento de fonemas;
coeficientes de compactação;
stacked autoenco ders;
phoneme recognition;
compaction coefficients;
stacked autoencoders.
Ciência da Computação
dc.subject.eng.fl_str_mv phoneme recognition;
compaction coefficients;
stacked autoencoders.
dc.subject.cnpq.fl_str_mv Ciência da Computação
description Phoneme recognition is an area of linguistics and speech processing that involves identifying and distinguishing the distinctive sounds that make up a language. Recognizing phonemes involves the ability to discern and categorize the different sounds of speech, even when there are variations in pronunciation, context or intonation. In this work, a phoneme recognition model is proposed using a stacked autoencoder network, called CollabNet. CollabNet introduces a collaborative method for inserting new hidden layers, in contrast to the traditional stacking of autoencoders. In CollabNet, the addition of a new layer is done in a coordinated and gradual manner, allowing the designer to control its influence on the training. This collaboration ensures that the learning of the new layer is effectively integrated with the previous layers, resulting in more aligned and efficient training. To represent the phonemes, the frequencies were compacted using centroids so as to preserve the particularities of the sound. In order to create a geometric representation of the audios in the databases, the fast Fourier transform (FFT) was calculated for each audio sample, then the frequencies were grouped and the centroid of each group was calculated. Subsequently, the deep stacked autoencoder network was parameterized and trained to recognize phonetic syllables. With this representation of the audios, one could maintain their particular characterization so that CollabNet could identify the various sounds of the Brazilian Portuguese language, thus achieving an accuracy of 75.96% and a PER of 23.73%.
publishDate 2024
dc.date.accessioned.fl_str_mv 2024-09-02T19:42:44Z
dc.date.issued.fl_str_mv 2024-06-06
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.citation.fl_str_mv PEREIRA , Bianca Valéria Lopes. Reconhecimento de fonemas com compactação das frequências via centroide e redes stacked autoencoders. 2024.115 f. Dissertação (Programa de Pós-Graduação em Ciência da Computação/CCET) - Universidade Federal do Maranhão, São Luís, 2024.
dc.identifier.uri.fl_str_mv https://tedebc.ufma.br/jspui/handle/tede/tede/5486
identifier_str_mv PEREIRA , Bianca Valéria Lopes. Reconhecimento de fonemas com compactação das frequências via centroide e redes stacked autoencoders. 2024.115 f. Dissertação (Programa de Pós-Graduação em Ciência da Computação/CCET) - Universidade Federal do Maranhão, São Luís, 2024.
url https://tedebc.ufma.br/jspui/handle/tede/tede/5486
dc.language.iso.fl_str_mv por
language por
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Universidade Federal do Maranhão
dc.publisher.program.fl_str_mv PROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO/CCET
dc.publisher.initials.fl_str_mv UFMA
dc.publisher.country.fl_str_mv Brasil
dc.publisher.department.fl_str_mv DEPARTAMENTO DE INFORMÁTICA/CCET
publisher.none.fl_str_mv Universidade Federal do Maranhão
dc.source.none.fl_str_mv reponame:Biblioteca Digital de Teses e Dissertações da UFMA
instname:Universidade Federal do Maranhão (UFMA)
instacron:UFMA
instname_str Universidade Federal do Maranhão (UFMA)
instacron_str UFMA
institution UFMA
reponame_str Biblioteca Digital de Teses e Dissertações da UFMA
collection Biblioteca Digital de Teses e Dissertações da UFMA
bitstream.url.fl_str_mv http://tedebc.ufma.br:8080/bitstream/tede/5486/2/BIANCAVAL%C3%89RIALOPESPEREIRA.pdf
http://tedebc.ufma.br:8080/bitstream/tede/5486/1/license.txt
bitstream.checksum.fl_str_mv b750351cfdf854d9c1cd1c4f95ddcc27
97eeade1fce43278e63fe063657f8083
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
repository.name.fl_str_mv Biblioteca Digital de Teses e Dissertações da UFMA - Universidade Federal do Maranhão (UFMA)
repository.mail.fl_str_mv repositorio@ufma.br||repositorio@ufma.br
_version_ 1809926182947258368