Improving black-box speech-to-text systems via machine learning techniques

Schwade, Guilherme Vieira

Improving black-box speech-to-text systems via machine learning techniques

Detalhes bibliográficos
Autor(a) principal:	Schwade, Guilherme Vieira
Data de Publicação:	2016
Tipo de documento:	Trabalho de conclusão de curso
Idioma:	eng
Título da fonte:	Repositório Institucional da UFRGS
Texto Completo:	http://hdl.handle.net/10183/147632
Resumo:	There are several ways a user can interact with a computer. Not every way is equally appropriate for all situations: when typing, a keyboard is more appropriate; a mouse, on the other hand, is a better fit in case the user needs to control the cursor with precision. In some complex systems, the user might need to execute several different tasks, and, therefore, might need different ways to interact with the system. In order to simplify those interactions, the use of voice commands might be a good strategy, since they often allow the user to specify the task to be executed with a richer input vocabulary than that available via other, more standard input devices. However, the development of robust speech-to-text converters (SST converters) requires a lot of time and resources which development teams often do not have. There are widely-used SST converters available on the internet, such as theWeb Speech API from Google; these systems are in a very advanced stage of maturity considering general context applications—for instance, when they are used to analyze terms and words that occur in day-to-day conversations. However, these systems are often not efficient when used to analyze contextspecific terms, which occur only in particular systems or applications. Furthermore, these systems are usually black-box and cannot be modified or improved by developers who wish to use them to solve particular specialized speech-to-text problems. To analyze possible solutions to this problem, we study the development of an additional layer of software, trained via machine learning techniques, to correct or adapt the imperfect translations generated by a black-box STT when applied to a specific domain. In particular, we propose and evaluate several machine learning solutions to improve a complex flight tickets management system to which we wish to add voice-control capabilities. In the first part of this work, we discuss our motivation and describe the domain where the proposed methods evaluated. After that, mathematical theoretical background is presented and we introduce possible solutions to the particular domain at hand. At the end, a critical analysis of the results is made and future work is discussed.

Metadados do item

id	UFRGS-2_2b4e9ec67b0cbbfd007cfcefa4d2fdd5
oai_identifier_str	oai:www.lume.ufrgs.br:10183/147632
network_acronym_str	UFRGS-2
network_name_str	Repositório Institucional da UFRGS
repository_id_str
spelling	Schwade, Guilherme VieiraSilva, Bruno Castro da2016-08-25T02:16:26Z2016http://hdl.handle.net/10183/147632000999675There are several ways a user can interact with a computer. Not every way is equally appropriate for all situations: when typing, a keyboard is more appropriate; a mouse, on the other hand, is a better fit in case the user needs to control the cursor with precision. In some complex systems, the user might need to execute several different tasks, and, therefore, might need different ways to interact with the system. In order to simplify those interactions, the use of voice commands might be a good strategy, since they often allow the user to specify the task to be executed with a richer input vocabulary than that available via other, more standard input devices. However, the development of robust speech-to-text converters (SST converters) requires a lot of time and resources which development teams often do not have. There are widely-used SST converters available on the internet, such as theWeb Speech API from Google; these systems are in a very advanced stage of maturity considering general context applications—for instance, when they are used to analyze terms and words that occur in day-to-day conversations. However, these systems are often not efficient when used to analyze contextspecific terms, which occur only in particular systems or applications. Furthermore, these systems are usually black-box and cannot be modified or improved by developers who wish to use them to solve particular specialized speech-to-text problems. To analyze possible solutions to this problem, we study the development of an additional layer of software, trained via machine learning techniques, to correct or adapt the imperfect translations generated by a black-box STT when applied to a specific domain. In particular, we propose and evaluate several machine learning solutions to improve a complex flight tickets management system to which we wish to add voice-control capabilities. In the first part of this work, we discuss our motivation and describe the domain where the proposed methods evaluated. After that, mathematical theoretical background is presented and we introduce possible solutions to the particular domain at hand. At the end, a critical analysis of the results is made and future work is discussed.application/pdfengReconhecimento : PadroesAprendizagem : MaquinaSpeech RecognitionMachine learningLevenshtein distancePhonetic algorithmImproving black-box speech-to-text systems via machine learning techniquesinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/bachelorThesisUniversidade Federal do Rio Grande do SulInstituto de InformáticaPorto Alegre, BR-RS2016Ciência da Computação: Ênfase em Ciência da Computação: Bachareladograduaçãoinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFRGSinstname:Universidade Federal do Rio Grande do Sul (UFRGS)instacron:UFRGSORIGINAL000999675.pdf000999675.pdfTexto completo (inglês)application/pdf1489820http://www.lume.ufrgs.br/bitstream/10183/147632/1/000999675.pdf281646cc7e2781cd6d2ab586148f08f4MD51TEXT000999675.pdf.txt000999675.pdf.txtExtracted Texttext/plain116013http://www.lume.ufrgs.br/bitstream/10183/147632/2/000999675.pdf.txt8370f763cb9336bed074ff6025005ebcMD52THUMBNAIL000999675.pdf.jpg000999675.pdf.jpgGenerated Thumbnailimage/jpeg1048http://www.lume.ufrgs.br/bitstream/10183/147632/3/000999675.pdf.jpg621a53dd80a26c8921bd2056d64f58d6MD5310183/1476322018-10-29 08:40:57.214oai:www.lume.ufrgs.br:10183/147632Repositório de PublicaçõesPUBhttps://lume.ufrgs.br/oai/requestopendoar:2018-10-29T11:40:57Repositório Institucional da UFRGS - Universidade Federal do Rio Grande do Sul (UFRGS)false
dc.title.pt_BR.fl_str_mv	Improving black-box speech-to-text systems via machine learning techniques
title	Improving black-box speech-to-text systems via machine learning techniques
spellingShingle	Improving black-box speech-to-text systems via machine learning techniques Schwade, Guilherme Vieira Reconhecimento : Padroes Aprendizagem : Maquina Speech Recognition Machine learning Levenshtein distance Phonetic algorithm
title_short	Improving black-box speech-to-text systems via machine learning techniques
title_full	Improving black-box speech-to-text systems via machine learning techniques
title_fullStr	Improving black-box speech-to-text systems via machine learning techniques
title_full_unstemmed	Improving black-box speech-to-text systems via machine learning techniques
title_sort	Improving black-box speech-to-text systems via machine learning techniques
author	Schwade, Guilherme Vieira
author_facet	Schwade, Guilherme Vieira
author_role	author
dc.contributor.author.fl_str_mv	Schwade, Guilherme Vieira
dc.contributor.advisor1.fl_str_mv	Silva, Bruno Castro da
contributor_str_mv	Silva, Bruno Castro da
dc.subject.por.fl_str_mv	Reconhecimento : Padroes Aprendizagem : Maquina
topic	Reconhecimento : Padroes Aprendizagem : Maquina Speech Recognition Machine learning Levenshtein distance Phonetic algorithm
dc.subject.eng.fl_str_mv	Speech Recognition Machine learning Levenshtein distance Phonetic algorithm
description	There are several ways a user can interact with a computer. Not every way is equally appropriate for all situations: when typing, a keyboard is more appropriate; a mouse, on the other hand, is a better fit in case the user needs to control the cursor with precision. In some complex systems, the user might need to execute several different tasks, and, therefore, might need different ways to interact with the system. In order to simplify those interactions, the use of voice commands might be a good strategy, since they often allow the user to specify the task to be executed with a richer input vocabulary than that available via other, more standard input devices. However, the development of robust speech-to-text converters (SST converters) requires a lot of time and resources which development teams often do not have. There are widely-used SST converters available on the internet, such as theWeb Speech API from Google; these systems are in a very advanced stage of maturity considering general context applications—for instance, when they are used to analyze terms and words that occur in day-to-day conversations. However, these systems are often not efficient when used to analyze contextspecific terms, which occur only in particular systems or applications. Furthermore, these systems are usually black-box and cannot be modified or improved by developers who wish to use them to solve particular specialized speech-to-text problems. To analyze possible solutions to this problem, we study the development of an additional layer of software, trained via machine learning techniques, to correct or adapt the imperfect translations generated by a black-box STT when applied to a specific domain. In particular, we propose and evaluate several machine learning solutions to improve a complex flight tickets management system to which we wish to add voice-control capabilities. In the first part of this work, we discuss our motivation and describe the domain where the proposed methods evaluated. After that, mathematical theoretical background is presented and we introduce possible solutions to the particular domain at hand. At the end, a critical analysis of the results is made and future work is discussed.
publishDate	2016
dc.date.accessioned.fl_str_mv	2016-08-25T02:16:26Z
dc.date.issued.fl_str_mv	2016
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/bachelorThesis
format	bachelorThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10183/147632
dc.identifier.nrb.pt_BR.fl_str_mv	000999675
url	http://hdl.handle.net/10183/147632
identifier_str_mv	000999675
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:Repositório Institucional da UFRGS instname:Universidade Federal do Rio Grande do Sul (UFRGS) instacron:UFRGS
instname_str	Universidade Federal do Rio Grande do Sul (UFRGS)
instacron_str	UFRGS
institution	UFRGS
reponame_str	Repositório Institucional da UFRGS
collection	Repositório Institucional da UFRGS
bitstream.url.fl_str_mv	http://www.lume.ufrgs.br/bitstream/10183/147632/1/000999675.pdf http://www.lume.ufrgs.br/bitstream/10183/147632/2/000999675.pdf.txt http://www.lume.ufrgs.br/bitstream/10183/147632/3/000999675.pdf.jpg
bitstream.checksum.fl_str_mv	281646cc7e2781cd6d2ab586148f08f4 8370f763cb9336bed074ff6025005ebc 621a53dd80a26c8921bd2056d64f58d6
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5 MD5
repository.name.fl_str_mv	Repositório Institucional da UFRGS - Universidade Federal do Rio Grande do Sul (UFRGS)
repository.mail.fl_str_mv
_version_	1798486851991896064

Improving black-box speech-to-text systems via machine learning techniques

Registros relacionados