Reconhecimento Automático de Fonemas via RNA Profunda
Autor(a) principal: | |
---|---|
Data de Publicação: | 2020 |
Tipo de documento: | Dissertação |
Idioma: | por |
Título da fonte: | Biblioteca Digital de Teses e Dissertações da UFMA |
Texto Completo: | https://tedebc.ufma.br/jspui/handle/tede/tede/3355 |
Resumo: | This work presents a phoneme recognition model using object detection techniques. The Single Shot Detection detector was used in conjunction with the MobileNet convolutional network architecture. The databases used in model training were TIMIT and LibriSpeech, both have spoken audios in English. To generate a graphical representation using the audiobases, for each audio, its spectrogram was calculated on the Mel scale and to train the algorithm of phoneme location detection, the temporal position of the occurrence of each phoneme in its respective was noted for its spectrogram. Additionally, it was necessary to increase the training data set, in order to provide improvement in the generalization of the model and for that, the two databases were joined and data augmentation techniques were applied to audios. The results of this work were close to the results obtained in other state of the art works. This research used two models with different architectures: the MobileNet-Large architecture, which obtained an accuracy of 0.72 mAP@0.5IOU and an error rate per phoneme of 19.47 % and the MobileNet-Small architecture, which obtained an accuracy of 0.63 mAP@0.5IOU and error rate per phoneme equal to 31.02 %. |
id |
UFMA_3786ef45815fd44dfa376796e20f66c6 |
---|---|
oai_identifier_str |
oai:tede2:tede/3355 |
network_acronym_str |
UFMA |
network_name_str |
Biblioteca Digital de Teses e Dissertações da UFMA |
repository_id_str |
2131 |
spelling |
ALMEIDA NETO, Areolino dehttp://lattes.cnpq.br/8041675571955870ALMEIDA NETO, Areolino dehttp://lattes.cnpq.br/8041675571955870OLIVEIRA, Alexandre César Muniz dehttp://lattes.cnpq.br/5225588855422632SILVA, Rogério Moreira Limahttp://lattes.cnpq.br/0490351544174740http://lattes.cnpq.br/2756606178387194CARVALHO, Mateus Barros Frota de2021-09-23T14:48:10Z2020-12-11CARVALHO, Mateus Barros Frota de. Reconhecimento Automático de Fonemas via RNA Profunda. 2020. 68 f. Dissertação (Programa de Pós-Graduação em Ciência da Computação/CCET) - Universidade Federal do Maranhão, São Luís, 2020.https://tedebc.ufma.br/jspui/handle/tede/tede/3355This work presents a phoneme recognition model using object detection techniques. The Single Shot Detection detector was used in conjunction with the MobileNet convolutional network architecture. The databases used in model training were TIMIT and LibriSpeech, both have spoken audios in English. To generate a graphical representation using the audiobases, for each audio, its spectrogram was calculated on the Mel scale and to train the algorithm of phoneme location detection, the temporal position of the occurrence of each phoneme in its respective was noted for its spectrogram. Additionally, it was necessary to increase the training data set, in order to provide improvement in the generalization of the model and for that, the two databases were joined and data augmentation techniques were applied to audios. The results of this work were close to the results obtained in other state of the art works. This research used two models with different architectures: the MobileNet-Large architecture, which obtained an accuracy of 0.72 mAP@0.5IOU and an error rate per phoneme of 19.47 % and the MobileNet-Small architecture, which obtained an accuracy of 0.63 mAP@0.5IOU and error rate per phoneme equal to 31.02 %.Este trabalho apresenta um modelo de reconhecimento de fonemas utilizando técnicas de detecção de objetos. Utilizou-se o detector Single Shot Detection em conjunto com a arquitetura de rede convolucional MobileNet. As bases de dados empregadas para treinar o modelo foram a TIMIT e a LibriSpeech, ambas são constituídas por áudios da língua inglesa. Para criar uma representação gráfica dos áudios das bases, para cada amostra de áudio, calculou-se o seu espectrograma na escala de Mel e para treinar o algoritmo de detecção de localização dos fonemas, anotou-se a posição temporal da ocorrência de cada fonema no seu respectivo espectrograma. Adicionalmente, foi necessário aumentar o conjunto de dados de treino, de forma a proporcionar melhora na generalização do modelo e para isso, juntaramse as duas bases de dados e aplicaram-se técnicas de aumento de dados para áudios. Os resultados deste trabalho ficaram próximos dos resultados obtidos em importantes trabalhos recentemente publicados. Esta pesquisa usou dois modelos com arquiteturas diferentes: a arquitetura MobileNet−Large, a qual obteve uma acurácia de 0,72 mAP@0.5IOU e uma taxa de erro por fonema de 19,47% e a arquitetura MobileNet − Small, a qual obteve uma acurácia de 0,63 mAP@0.5IOU e taxa de erro por fonema igual a 31,02%.Submitted by Sheila MONTEIRO (sheila.monteiro@ufma.br) on 2021-09-23T14:48:10Z No. of bitstreams: 1 MATEUS-CARVALHO.pdf: 2251513 bytes, checksum: 9136b046c2cd96099f89eac7609bf9b1 (MD5)Made available in DSpace on 2021-09-23T14:48:10Z (GMT). No. of bitstreams: 1 MATEUS-CARVALHO.pdf: 2251513 bytes, checksum: 9136b046c2cd96099f89eac7609bf9b1 (MD5) Previous issue date: 2020-12-11application/pdfporUniversidade Federal do MaranhãoPROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO/CCETUFMABrasilDEPARTAMENTO DE INFORMÁTICA/CCETDetecção de objetosReconhecimento de falaReconhecimento de fonemasObject detectionVoice recognitionPhoneme recognitionCiência da ComputaçãoReconhecimento Automático de Fonemas via RNA ProfundaAutomatic Phoneme Recognition via Deep ANNinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisinfo:eu-repo/semantics/openAccessreponame:Biblioteca Digital de Teses e Dissertações da UFMAinstname:Universidade Federal do Maranhão (UFMA)instacron:UFMAORIGINALMATEUS-CARVALHO.pdfMATEUS-CARVALHO.pdfapplication/pdf2251513http://tedebc.ufma.br:8080/bitstream/tede/3355/2/MATEUS-CARVALHO.pdf9136b046c2cd96099f89eac7609bf9b1MD52LICENSElicense.txtlicense.txttext/plain; charset=utf-82255http://tedebc.ufma.br:8080/bitstream/tede/3355/1/license.txt97eeade1fce43278e63fe063657f8083MD51tede/33552021-09-23 11:48:10.38oai:tede2:tede/3355IExJQ0VOw4dBIERFIERJU1RSSUJVScOHw4NPIE7Dg08tRVhDTFVTSVZBCgpDb20gYSBhcHJlc2VudGHDp8OjbyBkZXN0YSBsaWNlbsOnYSxvIGF1dG9yIChlcykgb3UgbyB0aXR1bGFyIGRvcyBkaXJlaXRvcyBkZSBhdXRvciBjb25jZWRlIMOgIFVuaXZlcnNpZGFkZSBGZWRlcmFsIGRvIE1hcmFuaMOjbyAoVUZNQSkgbyBkaXJlaXRvIG7Do28tZXhjbHVzaXZvIGRlIHJlcHJvZHV6aXIsIHRyYWR1emlyIChjb25mb3JtZSBkZWZpbmlkbyBhYmFpeG8pLCBlL291IGRpc3RyaWJ1aXIgYSBzdWEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvIChpbmNsdWluZG8gbyByZXN1bW8pIHBvciB0b2RvIG8gbXVuZG8gbm8gZm9ybWF0byBpbXByZXNzbyBlIGVsZXRyw7RuaWNvIGUgZW0gcXVhbHF1ZXIgbWVpbywgaW5jbHVpbmRvIG9zIGZvcm1hdG9zIMOhdWRpbyBvdSB2w61kZW8uCgpWb2PDqiBjb25jb3JkYSBxdWUgYSBVRk1BIHBvZGUsIHNlbSBhbHRlcmFyIG8gY29udGXDumRvLCB0cmFuc3BvciBhIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28gcGFyYSBxdWFscXVlciBtZWlvIG91IGZvcm1hdG8gcGFyYSBmaW5zIGRlIHByZXNlcnZhw6fDo28uCgpWb2PDqiB0YW1iw6ltIGNvbmNvcmRhIHF1ZSBhIFVGTUEgcG9kZSBtYW50ZXIgbWFpcyBkZSB1bWEgY8OzcGlhIGRlIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28gcGFyYSBmaW5zIGRlIHNlZ3VyYW7Dp2EsIGJhY2stdXAgZSBwcmVzZXJ2YcOnw6NvLgoKVm9jw6ogZGVjbGFyYSBxdWUgYSBzdWEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvIMOpIG9yaWdpbmFsIGUgcXVlIHZvY8OqIHRlbSBvIHBvZGVyIGRlIGNvbmNlZGVyIG9zIGRpcmVpdG9zIGNvbnRpZG9zIG5lc3RhIGxpY2Vuw6dhLiBWb2PDqiB0YW1iw6ltIGRlY2xhcmEgcXVlIG8gZGVww7NzaXRvIGRhIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28gbsOjbywgcXVlIHNlamEgZGUgc2V1IGNvbmhlY2ltZW50bywgaW5mcmluZ2UgZGlyZWl0b3MgYXV0b3JhaXMgZGUgbmluZ3XDqW0uCgpDYXNvIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyBjb250ZW5oYSBtYXRlcmlhbCBxdWUgdm9jw6ogbsOjbyBwb3NzdWkgYSB0aXR1bGFyaWRhZGUgZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCB2b2PDqiBkZWNsYXJhIHF1ZSBvYnRldmUgYSBwZXJtaXNzw6NvIGlycmVzdHJpdGEgZG8gZGV0ZW50b3IgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIHBhcmEgY29uY2VkZXIgw6AgVUZNQSBvcyBkaXJlaXRvcyBhcHJlc2VudGFkb3MgbmVzdGEgbGljZW7Dp2EsIGUgcXVlIGVzc2UgbWF0ZXJpYWwgZGUgcHJvcHJpZWRhZGUgZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUgaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3Ugbm8gY29udGXDumRvIGRhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyBvcmEgZGVwb3NpdGFkYS4KCkNBU08gQSBURVNFIE9VIERJU1NFUlRBw4fDg08gT1JBIERFUE9TSVRBREEgVEVOSEEgU0lETyBSRVNVTFRBRE8gREUgVU0gUEFUUk9Dw41OSU8gT1UgQVBPSU8gREUgVU1BIEFHw4pOQ0lBIERFIEZPTUVOVE8gT1UgT1VUUk8gT1JHQU5JU01PIFFVRSBOw4NPIFNFSkEgQSBVRk1BLCBWT0PDiiBERUNMQVJBIFFVRSBSRVNQRUlUT1UgVE9ET1MgRSBRVUFJU1FVRVIgRElSRUlUT1MgREUgUkVWSVPDg08gQ09NTyBUQU1Cw4lNIEFTIERFTUFJUyBPQlJJR0HDh8OVRVMgRVhJR0lEQVMgUE9SIENPTlRSQVRPIE9VIEFDT1JETy4KCkEgVUZNQSBzZSBjb21wcm9tZXRlIGEgaWRlbnRpZmljYXIgY2xhcmFtZW50ZSBvIHNldSBub21lIG91IG8ocykgbm9tZShzKSBkbyhzKSBkZXRlbnRvcihlcykgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIGRhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbywgZSBuw6NvIGZhcsOhIHF1YWxxdWVyIGFsdGVyYcOnw6NvLCBhbMOpbSBkYXF1ZWxhcyBjb25jZWRpZGFzIHBvciBlc3RhIGxpY2Vuw6dhLgoKRGVjbGFyYSB0YW1iw6ltIHF1ZSB0b2RhcyBhcyBhZmlsaWHDp8O1ZXMgY29ycG9yYXRpdmFzIG91IGluc3RpdHVjaW9uYWlzIGUgdG9kYXMgYXMgZm9udGVzIGRlIGFwb2lvIGZpbmFuY2Vpcm8gYW8gdHJhYmFsaG8gZXN0w6NvIGRldmlkYW1lbnRlIGNpdGFkYXMgb3UgbWVuY2lvbmFkYXMgZSBjZXJ0aWZpY2EgcXVlIG7Do28gaMOhIG5lbmh1bSBpbnRlcmVzc2UgY29tZXJjaWFsIG91IGFzc29jaWF0aXZvIHF1ZSByZXByZXNlbnRlIGNvbmZsaXRvIGRlIGludGVyZXNzZSBlbSBjb25leMOjbyBjb20gbyB0cmFiYWxobyBzdWJtZXRpZG8uCgoKCgoKCgo=Biblioteca Digital de Teses e Dissertaçõeshttps://tedebc.ufma.br/jspui/PUBhttp://tedebc.ufma.br:8080/oai/requestrepositorio@ufma.br||repositorio@ufma.bropendoar:21312021-09-23T14:48:10Biblioteca Digital de Teses e Dissertações da UFMA - Universidade Federal do Maranhão (UFMA)false |
dc.title.por.fl_str_mv |
Reconhecimento Automático de Fonemas via RNA Profunda |
dc.title.alternative.eng.fl_str_mv |
Automatic Phoneme Recognition via Deep ANN |
title |
Reconhecimento Automático de Fonemas via RNA Profunda |
spellingShingle |
Reconhecimento Automático de Fonemas via RNA Profunda CARVALHO, Mateus Barros Frota de Detecção de objetos Reconhecimento de fala Reconhecimento de fonemas Object detection Voice recognition Phoneme recognition Ciência da Computação |
title_short |
Reconhecimento Automático de Fonemas via RNA Profunda |
title_full |
Reconhecimento Automático de Fonemas via RNA Profunda |
title_fullStr |
Reconhecimento Automático de Fonemas via RNA Profunda |
title_full_unstemmed |
Reconhecimento Automático de Fonemas via RNA Profunda |
title_sort |
Reconhecimento Automático de Fonemas via RNA Profunda |
author |
CARVALHO, Mateus Barros Frota de |
author_facet |
CARVALHO, Mateus Barros Frota de |
author_role |
author |
dc.contributor.advisor1.fl_str_mv |
ALMEIDA NETO, Areolino de |
dc.contributor.advisor1Lattes.fl_str_mv |
http://lattes.cnpq.br/8041675571955870 |
dc.contributor.referee1.fl_str_mv |
ALMEIDA NETO, Areolino de |
dc.contributor.referee1Lattes.fl_str_mv |
http://lattes.cnpq.br/8041675571955870 |
dc.contributor.referee2.fl_str_mv |
OLIVEIRA, Alexandre César Muniz de |
dc.contributor.referee2Lattes.fl_str_mv |
http://lattes.cnpq.br/5225588855422632 |
dc.contributor.referee3.fl_str_mv |
SILVA, Rogério Moreira Lima |
dc.contributor.referee3Lattes.fl_str_mv |
http://lattes.cnpq.br/0490351544174740 |
dc.contributor.authorLattes.fl_str_mv |
http://lattes.cnpq.br/2756606178387194 |
dc.contributor.author.fl_str_mv |
CARVALHO, Mateus Barros Frota de |
contributor_str_mv |
ALMEIDA NETO, Areolino de ALMEIDA NETO, Areolino de OLIVEIRA, Alexandre César Muniz de SILVA, Rogério Moreira Lima |
dc.subject.por.fl_str_mv |
Detecção de objetos Reconhecimento de fala Reconhecimento de fonemas |
topic |
Detecção de objetos Reconhecimento de fala Reconhecimento de fonemas Object detection Voice recognition Phoneme recognition Ciência da Computação |
dc.subject.eng.fl_str_mv |
Object detection Voice recognition Phoneme recognition |
dc.subject.cnpq.fl_str_mv |
Ciência da Computação |
description |
This work presents a phoneme recognition model using object detection techniques. The Single Shot Detection detector was used in conjunction with the MobileNet convolutional network architecture. The databases used in model training were TIMIT and LibriSpeech, both have spoken audios in English. To generate a graphical representation using the audiobases, for each audio, its spectrogram was calculated on the Mel scale and to train the algorithm of phoneme location detection, the temporal position of the occurrence of each phoneme in its respective was noted for its spectrogram. Additionally, it was necessary to increase the training data set, in order to provide improvement in the generalization of the model and for that, the two databases were joined and data augmentation techniques were applied to audios. The results of this work were close to the results obtained in other state of the art works. This research used two models with different architectures: the MobileNet-Large architecture, which obtained an accuracy of 0.72 mAP@0.5IOU and an error rate per phoneme of 19.47 % and the MobileNet-Small architecture, which obtained an accuracy of 0.63 mAP@0.5IOU and error rate per phoneme equal to 31.02 %. |
publishDate |
2020 |
dc.date.issued.fl_str_mv |
2020-12-11 |
dc.date.accessioned.fl_str_mv |
2021-09-23T14:48:10Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.citation.fl_str_mv |
CARVALHO, Mateus Barros Frota de. Reconhecimento Automático de Fonemas via RNA Profunda. 2020. 68 f. Dissertação (Programa de Pós-Graduação em Ciência da Computação/CCET) - Universidade Federal do Maranhão, São Luís, 2020. |
dc.identifier.uri.fl_str_mv |
https://tedebc.ufma.br/jspui/handle/tede/tede/3355 |
identifier_str_mv |
CARVALHO, Mateus Barros Frota de. Reconhecimento Automático de Fonemas via RNA Profunda. 2020. 68 f. Dissertação (Programa de Pós-Graduação em Ciência da Computação/CCET) - Universidade Federal do Maranhão, São Luís, 2020. |
url |
https://tedebc.ufma.br/jspui/handle/tede/tede/3355 |
dc.language.iso.fl_str_mv |
por |
language |
por |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Universidade Federal do Maranhão |
dc.publisher.program.fl_str_mv |
PROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO/CCET |
dc.publisher.initials.fl_str_mv |
UFMA |
dc.publisher.country.fl_str_mv |
Brasil |
dc.publisher.department.fl_str_mv |
DEPARTAMENTO DE INFORMÁTICA/CCET |
publisher.none.fl_str_mv |
Universidade Federal do Maranhão |
dc.source.none.fl_str_mv |
reponame:Biblioteca Digital de Teses e Dissertações da UFMA instname:Universidade Federal do Maranhão (UFMA) instacron:UFMA |
instname_str |
Universidade Federal do Maranhão (UFMA) |
instacron_str |
UFMA |
institution |
UFMA |
reponame_str |
Biblioteca Digital de Teses e Dissertações da UFMA |
collection |
Biblioteca Digital de Teses e Dissertações da UFMA |
bitstream.url.fl_str_mv |
http://tedebc.ufma.br:8080/bitstream/tede/3355/2/MATEUS-CARVALHO.pdf http://tedebc.ufma.br:8080/bitstream/tede/3355/1/license.txt |
bitstream.checksum.fl_str_mv |
9136b046c2cd96099f89eac7609bf9b1 97eeade1fce43278e63fe063657f8083 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 |
repository.name.fl_str_mv |
Biblioteca Digital de Teses e Dissertações da UFMA - Universidade Federal do Maranhão (UFMA) |
repository.mail.fl_str_mv |
repositorio@ufma.br||repositorio@ufma.br |
_version_ |
1797048332310282240 |