Reconhecimento Automático de Fonemas via RNA Profunda

CARVALHO, Mateus Barros Frota de

Reconhecimento Automático de Fonemas via RNA Profunda

Detalhes bibliográficos
Autor(a) principal:	CARVALHO, Mateus Barros Frota de
Data de Publicação:	2020
Tipo de documento:	Dissertação
Idioma:	por
Título da fonte:	Biblioteca Digital de Teses e Dissertações da UFMA
Texto Completo:	https://tedebc.ufma.br/jspui/handle/tede/tede/3355
Resumo:	This work presents a phoneme recognition model using object detection techniques. The Single Shot Detection detector was used in conjunction with the MobileNet convolutional network architecture. The databases used in model training were TIMIT and LibriSpeech, both have spoken audios in English. To generate a graphical representation using the audiobases, for each audio, its spectrogram was calculated on the Mel scale and to train the algorithm of phoneme location detection, the temporal position of the occurrence of each phoneme in its respective was noted for its spectrogram. Additionally, it was necessary to increase the training data set, in order to provide improvement in the generalization of the model and for that, the two databases were joined and data augmentation techniques were applied to audios. The results of this work were close to the results obtained in other state of the art works. This research used two models with different architectures: the MobileNet-Large architecture, which obtained an accuracy of 0.72 mAP@0.5IOU and an error rate per phoneme of 19.47 % and the MobileNet-Small architecture, which obtained an accuracy of 0.63 mAP@0.5IOU and error rate per phoneme equal to 31.02 %.

Metadados do item

id	UFMA_3786ef45815fd44dfa376796e20f66c6
oai_identifier_str	oai:tede2:tede/3355
network_acronym_str	UFMA
network_name_str	Biblioteca Digital de Teses e Dissertações da UFMA
repository_id_str	2131
spelling	ALMEIDA NETO, Areolino dehttp://lattes.cnpq.br/8041675571955870ALMEIDA NETO, Areolino dehttp://lattes.cnpq.br/8041675571955870OLIVEIRA, Alexandre César Muniz dehttp://lattes.cnpq.br/5225588855422632SILVA, Rogério Moreira Limahttp://lattes.cnpq.br/0490351544174740http://lattes.cnpq.br/2756606178387194CARVALHO, Mateus Barros Frota de2021-09-23T14:48:10Z2020-12-11CARVALHO, Mateus Barros Frota de. Reconhecimento Automático de Fonemas via RNA Profunda. 2020. 68 f. Dissertação (Programa de Pós-Graduação em Ciência da Computação/CCET) - Universidade Federal do Maranhão, São Luís, 2020.https://tedebc.ufma.br/jspui/handle/tede/tede/3355This work presents a phoneme recognition model using object detection techniques. The Single Shot Detection detector was used in conjunction with the MobileNet convolutional network architecture. The databases used in model training were TIMIT and LibriSpeech, both have spoken audios in English. To generate a graphical representation using the audiobases, for each audio, its spectrogram was calculated on the Mel scale and to train the algorithm of phoneme location detection, the temporal position of the occurrence of each phoneme in its respective was noted for its spectrogram. Additionally, it was necessary to increase the training data set, in order to provide improvement in the generalization of the model and for that, the two databases were joined and data augmentation techniques were applied to audios. The results of this work were close to the results obtained in other state of the art works. This research used two models with different architectures: the MobileNet-Large architecture, which obtained an accuracy of 0.72 mAP@0.5IOU and an error rate per phoneme of 19.47 % and the MobileNet-Small architecture, which obtained an accuracy of 0.63 mAP@0.5IOU and error rate per phoneme equal to 31.02 %.Este trabalho apresenta um modelo de reconhecimento de fonemas utilizando técnicas de detecção de objetos. Utilizou-se o detector Single Shot Detection em conjunto com a arquitetura de rede convolucional MobileNet. As bases de dados empregadas para treinar o modelo foram a TIMIT e a LibriSpeech, ambas são constituídas por áudios da língua inglesa. Para criar uma representação gráfica dos áudios das bases, para cada amostra de áudio, calculou-se o seu espectrograma na escala de Mel e para treinar o algoritmo de detecção de localização dos fonemas, anotou-se a posição temporal da ocorrência de cada fonema no seu respectivo espectrograma. Adicionalmente, foi necessário aumentar o conjunto de dados de treino, de forma a proporcionar melhora na generalização do modelo e para isso, juntaramse as duas bases de dados e aplicaram-se técnicas de aumento de dados para áudios. Os resultados deste trabalho ficaram próximos dos resultados obtidos em importantes trabalhos recentemente publicados. Esta pesquisa usou dois modelos com arquiteturas diferentes: a arquitetura MobileNet−Large, a qual obteve uma acurácia de 0,72 mAP@0.5IOU e uma taxa de erro por fonema de 19,47% e a arquitetura MobileNet − Small, a qual obteve uma acurácia de 0,63 mAP@0.5IOU e taxa de erro por fonema igual a 31,02%.Submitted by Sheila MONTEIRO (sheila.monteiro@ufma.br) on 2021-09-23T14:48:10Z No. of bitstreams: 1 MATEUS-CARVALHO.pdf: 2251513 bytes, checksum: 9136b046c2cd96099f89eac7609bf9b1 (MD5)Made available in DSpace on 2021-09-23T14:48:10Z (GMT). No. of bitstreams: 1 MATEUS-CARVALHO.pdf: 2251513 bytes, checksum: 9136b046c2cd96099f89eac7609bf9b1 (MD5) Previous issue date: 2020-12-11application/pdfporUniversidade Federal do MaranhãoPROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO/CCETUFMABrasilDEPARTAMENTO DE INFORMÁTICA/CCETDetecção de objetosReconhecimento de falaReconhecimento de fonemasObject detectionVoice recognitionPhoneme recognitionCiência da ComputaçãoReconhecimento Automático de Fonemas via RNA ProfundaAutomatic Phoneme Recognition via Deep ANNinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisinfo:eu-repo/semantics/openAccessreponame:Biblioteca Digital de Teses e Dissertações da UFMAinstname:Universidade Federal do Maranhão (UFMA)instacron:UFMAORIGINALMATEUS-CARVALHO.pdfMATEUS-CARVALHO.pdfapplication/pdf2251513http://tedebc.ufma.br:8080/bitstream/tede/3355/2/MATEUS-CARVALHO.pdf9136b046c2cd96099f89eac7609bf9b1MD52LICENSElicense.txtlicense.txttext/plain; charset=utf-82255http://tedebc.ufma.br:8080/bitstream/tede/3355/1/license.txt97eeade1fce43278e63fe063657f8083MD51tede/33552021-09-23 11:48:10.38oai:tede2:tede/3355IExJQ0VOw4dBIERFIERJU1RSSUJVScOHw4NPIE7Dg08tRVhDTFVTSVZBCgpDb20gYSBhcHJlc2VudGHDp8OjbyBkZXN0YSBsaWNlbsOnYSxvIGF1dG9yIChlcykgb3UgbyB0aXR1bGFyIGRvcyBkaXJlaXRvcyBkZSBhdXRvciBjb25jZWRlIMOgIFVuaXZlcnNpZGFkZSBGZWRlcmFsIGRvIE1hcmFuaMOjbyAoVUZNQSkgbyBkaXJlaXRvIG7Do28tZXhjbHVzaXZvIGRlIHJlcHJvZHV6aXIsIHRyYWR1emlyIChjb25mb3JtZSBkZWZpbmlkbyBhYmFpeG8pLCBlL291IGRpc3RyaWJ1aXIgYSBzdWEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvIChpbmNsdWluZG8gbyByZXN1bW8pIHBvciB0b2RvIG8gbXVuZG8gbm8gZm9ybWF0byBpbXByZXNzbyBlIGVsZXRyw7RuaWNvIGUgZW0gcXVhbHF1ZXIgbWVpbywgaW5jbHVpbmRvIG9zIGZvcm1hdG9zIMOhdWRpbyBvdSB2w61kZW8uCgpWb2PDqiBjb25jb3JkYSBxdWUgYSBVRk1BIHBvZGUsIHNlbSBhbHRlcmFyIG8gY29udGXDumRvLCB0cmFuc3BvciBhIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28gcGFyYSBxdWFscXVlciBtZWlvIG91IGZvcm1hdG8gcGFyYSBmaW5zIGRlIHByZXNlcnZhw6fDo28uCgpWb2PDqiB0YW1iw6ltIGNvbmNvcmRhIHF1ZSBhIFVGTUEgcG9kZSBtYW50ZXIgbWFpcyBkZSB1bWEgY8OzcGlhIGRlIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28gcGFyYSBmaW5zIGRlIHNlZ3VyYW7Dp2EsIGJhY2stdXAgZSBwcmVzZXJ2YcOnw6NvLgoKVm9jw6ogZGVjbGFyYSBxdWUgYSBzdWEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvIMOpIG9yaWdpbmFsIGUgcXVlIHZvY8OqIHRlbSBvIHBvZGVyIGRlIGNvbmNlZGVyIG9zIGRpcmVpdG9zIGNvbnRpZG9zIG5lc3RhIGxpY2Vuw6dhLiBWb2PDqiB0YW1iw6ltIGRlY2xhcmEgcXVlIG8gZGVww7NzaXRvIGRhIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28gbsOjbywgcXVlIHNlamEgZGUgc2V1IGNvbmhlY2ltZW50bywgaW5mcmluZ2UgZGlyZWl0b3MgYXV0b3JhaXMgZGUgbmluZ3XDqW0uCgpDYXNvIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyBjb250ZW5oYSBtYXRlcmlhbCBxdWUgdm9jw6ogbsOjbyBwb3NzdWkgYSB0aXR1bGFyaWRhZGUgZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCB2b2PDqiBkZWNsYXJhIHF1ZSBvYnRldmUgYSBwZXJtaXNzw6NvIGlycmVzdHJpdGEgZG8gZGV0ZW50b3IgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIHBhcmEgY29uY2VkZXIgw6AgVUZNQSBvcyBkaXJlaXRvcyBhcHJlc2VudGFkb3MgbmVzdGEgbGljZW7Dp2EsIGUgcXVlIGVzc2UgbWF0ZXJpYWwgZGUgcHJvcHJpZWRhZGUgZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUgaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3Ugbm8gY29udGXDumRvIGRhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyBvcmEgZGVwb3NpdGFkYS4KCkNBU08gQSBURVNFIE9VIERJU1NFUlRBw4fDg08gT1JBIERFUE9TSVRBREEgVEVOSEEgU0lETyBSRVNVTFRBRE8gREUgVU0gUEFUUk9Dw41OSU8gT1UgQVBPSU8gREUgVU1BIEFHw4pOQ0lBIERFIEZPTUVOVE8gT1UgT1VUUk8gT1JHQU5JU01PIFFVRSBOw4NPIFNFSkEgQSBVRk1BLCBWT0PDiiBERUNMQVJBIFFVRSBSRVNQRUlUT1UgVE9ET1MgRSBRVUFJU1FVRVIgRElSRUlUT1MgREUgUkVWSVPDg08gQ09NTyBUQU1Cw4lNIEFTIERFTUFJUyBPQlJJR0HDh8OVRVMgRVhJR0lEQVMgUE9SIENPTlRSQVRPIE9VIEFDT1JETy4KCkEgVUZNQSBzZSBjb21wcm9tZXRlIGEgaWRlbnRpZmljYXIgY2xhcmFtZW50ZSBvIHNldSBub21lIG91IG8ocykgbm9tZShzKSBkbyhzKSBkZXRlbnRvcihlcykgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIGRhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbywgZSBuw6NvIGZhcsOhIHF1YWxxdWVyIGFsdGVyYcOnw6NvLCBhbMOpbSBkYXF1ZWxhcyBjb25jZWRpZGFzIHBvciBlc3RhIGxpY2Vuw6dhLgoKRGVjbGFyYSB0YW1iw6ltIHF1ZSB0b2RhcyBhcyBhZmlsaWHDp8O1ZXMgY29ycG9yYXRpdmFzIG91IGluc3RpdHVjaW9uYWlzIGUgdG9kYXMgYXMgZm9udGVzIGRlIGFwb2lvIGZpbmFuY2Vpcm8gYW8gdHJhYmFsaG8gZXN0w6NvIGRldmlkYW1lbnRlIGNpdGFkYXMgb3UgbWVuY2lvbmFkYXMgZSBjZXJ0aWZpY2EgcXVlIG7Do28gaMOhIG5lbmh1bSBpbnRlcmVzc2UgY29tZXJjaWFsIG91IGFzc29jaWF0aXZvIHF1ZSByZXByZXNlbnRlIGNvbmZsaXRvIGRlIGludGVyZXNzZSBlbSBjb25leMOjbyBjb20gbyB0cmFiYWxobyBzdWJtZXRpZG8uCgoKCgoKCgo=Biblioteca Digital de Teses e Dissertaçõeshttps://tedebc.ufma.br/jspui/PUBhttp://tedebc.ufma.br:8080/oai/requestrepositorio@ufma.br\|\|repositorio@ufma.bropendoar:21312021-09-23T14:48:10Biblioteca Digital de Teses e Dissertações da UFMA - Universidade Federal do Maranhão (UFMA)false
dc.title.por.fl_str_mv	Reconhecimento Automático de Fonemas via RNA Profunda
dc.title.alternative.eng.fl_str_mv	Automatic Phoneme Recognition via Deep ANN
title	Reconhecimento Automático de Fonemas via RNA Profunda
spellingShingle	Reconhecimento Automático de Fonemas via RNA Profunda CARVALHO, Mateus Barros Frota de Detecção de objetos Reconhecimento de fala Reconhecimento de fonemas Object detection Voice recognition Phoneme recognition Ciência da Computação
title_short	Reconhecimento Automático de Fonemas via RNA Profunda
title_full	Reconhecimento Automático de Fonemas via RNA Profunda
title_fullStr	Reconhecimento Automático de Fonemas via RNA Profunda
title_full_unstemmed	Reconhecimento Automático de Fonemas via RNA Profunda
title_sort	Reconhecimento Automático de Fonemas via RNA Profunda
author	CARVALHO, Mateus Barros Frota de
author_facet	CARVALHO, Mateus Barros Frota de
author_role	author
dc.contributor.advisor1.fl_str_mv	ALMEIDA NETO, Areolino de
dc.contributor.advisor1Lattes.fl_str_mv	http://lattes.cnpq.br/8041675571955870
dc.contributor.referee1.fl_str_mv	ALMEIDA NETO, Areolino de
dc.contributor.referee1Lattes.fl_str_mv	http://lattes.cnpq.br/8041675571955870
dc.contributor.referee2.fl_str_mv	OLIVEIRA, Alexandre César Muniz de
dc.contributor.referee2Lattes.fl_str_mv	http://lattes.cnpq.br/5225588855422632
dc.contributor.referee3.fl_str_mv	SILVA, Rogério Moreira Lima
dc.contributor.referee3Lattes.fl_str_mv	http://lattes.cnpq.br/0490351544174740
dc.contributor.authorLattes.fl_str_mv	http://lattes.cnpq.br/2756606178387194
dc.contributor.author.fl_str_mv	CARVALHO, Mateus Barros Frota de
contributor_str_mv	ALMEIDA NETO, Areolino de ALMEIDA NETO, Areolino de OLIVEIRA, Alexandre César Muniz de SILVA, Rogério Moreira Lima
dc.subject.por.fl_str_mv	Detecção de objetos Reconhecimento de fala Reconhecimento de fonemas
topic	Detecção de objetos Reconhecimento de fala Reconhecimento de fonemas Object detection Voice recognition Phoneme recognition Ciência da Computação
dc.subject.eng.fl_str_mv	Object detection Voice recognition Phoneme recognition
dc.subject.cnpq.fl_str_mv	Ciência da Computação
description	This work presents a phoneme recognition model using object detection techniques. The Single Shot Detection detector was used in conjunction with the MobileNet convolutional network architecture. The databases used in model training were TIMIT and LibriSpeech, both have spoken audios in English. To generate a graphical representation using the audiobases, for each audio, its spectrogram was calculated on the Mel scale and to train the algorithm of phoneme location detection, the temporal position of the occurrence of each phoneme in its respective was noted for its spectrogram. Additionally, it was necessary to increase the training data set, in order to provide improvement in the generalization of the model and for that, the two databases were joined and data augmentation techniques were applied to audios. The results of this work were close to the results obtained in other state of the art works. This research used two models with different architectures: the MobileNet-Large architecture, which obtained an accuracy of 0.72 mAP@0.5IOU and an error rate per phoneme of 19.47 % and the MobileNet-Small architecture, which obtained an accuracy of 0.63 mAP@0.5IOU and error rate per phoneme equal to 31.02 %.
publishDate	2020
dc.date.issued.fl_str_mv	2020-12-11
dc.date.accessioned.fl_str_mv	2021-09-23T14:48:10Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.citation.fl_str_mv	CARVALHO, Mateus Barros Frota de. Reconhecimento Automático de Fonemas via RNA Profunda. 2020. 68 f. Dissertação (Programa de Pós-Graduação em Ciência da Computação/CCET) - Universidade Federal do Maranhão, São Luís, 2020.
dc.identifier.uri.fl_str_mv	https://tedebc.ufma.br/jspui/handle/tede/tede/3355
identifier_str_mv	CARVALHO, Mateus Barros Frota de. Reconhecimento Automático de Fonemas via RNA Profunda. 2020. 68 f. Dissertação (Programa de Pós-Graduação em Ciência da Computação/CCET) - Universidade Federal do Maranhão, São Luís, 2020.
url	https://tedebc.ufma.br/jspui/handle/tede/tede/3355
dc.language.iso.fl_str_mv	por
language	por
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Universidade Federal do Maranhão
dc.publisher.program.fl_str_mv	PROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO/CCET
dc.publisher.initials.fl_str_mv	UFMA
dc.publisher.country.fl_str_mv	Brasil
dc.publisher.department.fl_str_mv	DEPARTAMENTO DE INFORMÁTICA/CCET
publisher.none.fl_str_mv	Universidade Federal do Maranhão
dc.source.none.fl_str_mv	reponame:Biblioteca Digital de Teses e Dissertações da UFMA instname:Universidade Federal do Maranhão (UFMA) instacron:UFMA
instname_str	Universidade Federal do Maranhão (UFMA)
instacron_str	UFMA
institution	UFMA
reponame_str	Biblioteca Digital de Teses e Dissertações da UFMA
collection	Biblioteca Digital de Teses e Dissertações da UFMA
bitstream.url.fl_str_mv	http://tedebc.ufma.br:8080/bitstream/tede/3355/2/MATEUS-CARVALHO.pdf http://tedebc.ufma.br:8080/bitstream/tede/3355/1/license.txt
bitstream.checksum.fl_str_mv	9136b046c2cd96099f89eac7609bf9b1 97eeade1fce43278e63fe063657f8083
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5
repository.name.fl_str_mv	Biblioteca Digital de Teses e Dissertações da UFMA - Universidade Federal do Maranhão (UFMA)
repository.mail.fl_str_mv	repositorio@ufma.br\|\|repositorio@ufma.br
_version_	1797048332310282240

Reconhecimento Automático de Fonemas via RNA Profunda

Registros relacionados