Classificação da qualidade do sinal de voz em comunicação VoIP utilizando Deep Learning

Costa, Lucas Hilário da

Classificação da qualidade do sinal de voz em comunicação VoIP utilizando Deep Learning

Detalhes bibliográficos
Autor(a) principal:	Costa, Lucas Hilário da
Data de Publicação:	2019
Tipo de documento:	Dissertação
Idioma:	por
Título da fonte:	Repositório Institucional da UFLA
Texto Completo:	http://repositorio.ufla.br/jspui/handle/1/36412
Resumo:	Voice over IP (VoIP) is currently one of the most widely used communication services; however, its quality is related to several external factors that cause various types of voice signal degradation. In communication channels, packet loss significantly affects the voice signal, causing lower communication quality, directly affecting the user’s quality of experience (QoE). The objective of this work was the implementation and development of two Deep Learning (DL) network models that are able to classify the quality of the voice signal transmitted in a VoIP communication, mainly affected by packet loss. The proposed models were developed using a Deep Neural Network (DNN) model, where through the analysis of the voice signal affected by Packet Loss Rate (PLR) of the degraded signals, it was possible to classify them into four distinct classes according to the user experience. To perform the tests two databases were prepared, each containing four distinct classes, one of which was prepared with the ITU-T P.862 recommendation database files, with different packet loss rates, and the another base was prepared with the ITU-T P.501 recommendation files according to the mean opinion score (MOS) index of each degraded file. To obtain the databases, a program was implemented in MATLAB that degrades original voice files by changing the packet loss rate values. After processing, the files were grouped into four classes according to the packet loss rate applied to each original voice signal. For the database prepared by the MOS index the degraded files were processed by the ITU-T P.862 recommendation algorithm in order to determine the MOS by comparing the degraded voice signal with the original signal of each audio file and then grouped into four classes according to the MOS obtained. To validate the models two additional databases were prepared containing VoxCeleb database audio files divided into four classes with 250 files each, being grouped by PLR rate and MOS. The results obtained from the model using the database prepared by the packet loss rate was 94% accuracy in the validation and the model results for the database prepared by the MOS was 91% accuracy. The model achieved an accuracy of 86.96% for the additional database prepared according to packet loss rate and 83.29% accuracy for the additional database prepared according to MOS. To determine the efficiency of the developed model, its results were compared with the results obtained by the ITU-T recommendations P.563 and P.862 algorithms, where an average of 53.21% accuracy was obtained when comparing the results. MOS definition of the ITU-T P.563 recommendation algorithm with that defined by the ITU-T P.862 recommendation algorithm. From the obtained results it can be concluded that the generated models were able to classify the packet loss rate and the MOS index in a non intrusive way and with an excellent accuracy rate. It can be highlighted that when comparing the non-intrusive methods, the results obtained from the proposed model for the MOS index which was 91% accuracy was better compared to the results from the ITU-T P.563 recommendation algorithm that obtained an accuracy rate of 53.21% compared to the intrusive algorithm results from the ITU-T P.862 recommendation. Thus, the generated model is able to determine the MOS of the degraded voice files more efficiently than the ITU-T P.563 recommendation algorithm. Consequently, an important contribution of this work is the presentation of a non-intrusive evaluation model capable of identifying the real-time voice signal quality.

Metadados do item

id	UFLA_bf9eb89b5a2ce08bf94fc4cb6833ad08
oai_identifier_str	oai:localhost:1/36412
network_acronym_str	UFLA
network_name_str	Repositório Institucional da UFLA
repository_id_str
spelling	Classificação da qualidade do sinal de voz em comunicação VoIP utilizando Deep LearningVoice signal quality rating in VoIP communication using Deep LearningVoz sobre IP (VoIP)Aprendizado de máquina profundoQualidade de vozAprendizado de máquinaRecomendação ITU-T P.862Recomendação ITU-T P.563Recomendação ITU-T P.501Voice over IP (VoIP)Deep learningVoice qualityMachine learningITU-T recommendation P.862ITU-T recommendation P.563ITU-T recommendation P.501Voice over Internet Protocol (VoIP)Sistemas de TelecomunicaçõesVoice over IP (VoIP) is currently one of the most widely used communication services; however, its quality is related to several external factors that cause various types of voice signal degradation. In communication channels, packet loss significantly affects the voice signal, causing lower communication quality, directly affecting the user’s quality of experience (QoE). The objective of this work was the implementation and development of two Deep Learning (DL) network models that are able to classify the quality of the voice signal transmitted in a VoIP communication, mainly affected by packet loss. The proposed models were developed using a Deep Neural Network (DNN) model, where through the analysis of the voice signal affected by Packet Loss Rate (PLR) of the degraded signals, it was possible to classify them into four distinct classes according to the user experience. To perform the tests two databases were prepared, each containing four distinct classes, one of which was prepared with the ITU-T P.862 recommendation database files, with different packet loss rates, and the another base was prepared with the ITU-T P.501 recommendation files according to the mean opinion score (MOS) index of each degraded file. To obtain the databases, a program was implemented in MATLAB that degrades original voice files by changing the packet loss rate values. After processing, the files were grouped into four classes according to the packet loss rate applied to each original voice signal. For the database prepared by the MOS index the degraded files were processed by the ITU-T P.862 recommendation algorithm in order to determine the MOS by comparing the degraded voice signal with the original signal of each audio file and then grouped into four classes according to the MOS obtained. To validate the models two additional databases were prepared containing VoxCeleb database audio files divided into four classes with 250 files each, being grouped by PLR rate and MOS. The results obtained from the model using the database prepared by the packet loss rate was 94% accuracy in the validation and the model results for the database prepared by the MOS was 91% accuracy. The model achieved an accuracy of 86.96% for the additional database prepared according to packet loss rate and 83.29% accuracy for the additional database prepared according to MOS. To determine the efficiency of the developed model, its results were compared with the results obtained by the ITU-T recommendations P.563 and P.862 algorithms, where an average of 53.21% accuracy was obtained when comparing the results. MOS definition of the ITU-T P.563 recommendation algorithm with that defined by the ITU-T P.862 recommendation algorithm. From the obtained results it can be concluded that the generated models were able to classify the packet loss rate and the MOS index in a non intrusive way and with an excellent accuracy rate. It can be highlighted that when comparing the non-intrusive methods, the results obtained from the proposed model for the MOS index which was 91% accuracy was better compared to the results from the ITU-T P.563 recommendation algorithm that obtained an accuracy rate of 53.21% compared to the intrusive algorithm results from the ITU-T P.862 recommendation. Thus, the generated model is able to determine the MOS of the degraded voice files more efficiently than the ITU-T P.563 recommendation algorithm. Consequently, an important contribution of this work is the presentation of a non-intrusive evaluation model capable of identifying the real-time voice signal quality.Atualmente a Voz sobre IP (VoIP - Voice over IP) é um dos serviços de comunicação mais utilizados, entretanto, sua qualidade está relacionada a diversos fatores externos que ocasionam diversos tipos de degradação do sinal de voz. Nos canais de comunicação, a perda de pacotes afeta significativamente o sinal de voz, fazendo com que a qualidade da comunicação seja menor, afetando diretamente a qualidade de experiência (QoE - Qualityof Experience) do usuário.O objetivo deste trabalho foi a implementação e desenvolvimento de dois modelos de rede Deep Learning (DL) que são capazes de classificar a qualidade do sinal de voz transmitido em uma comunicação VoIP, afetado principalmente pela perda de pacotes. Os modelos propostos foram desenvolvidos utilizando um modelo de rede neural profunda (DNN - Deep Neural Network), onde através da análise do sinal da voz afetada pela taxa de perda de pacotes (PLR - PacketLoss Rate), dos sinais degradados, foi possível classificá-los em quatro classes distintas de acordo com a experiência do usuário. Para a realização dos testes foram preparadas duas bases de dados, contendo cada uma, quatro classes distintas, onde uma foi preparada com os arquivos da base de dados da recomendação ITU-T P.862, com diferentes taxas de perda de pacotes, e a outra base foi preparada com os arquivos da recomendação ITU-T P.501 de acordo com o índice MOS (MeanOpinion Score) de cada arquivo degradado. Para obter as bases de dados foi implementado um programa no MATLAB que degrada arquivos de voz original mudando os valores da taxa de perda de pacotes, após o processamento, os arquivos foram agrupados em quatro classes de acordo com a taxa de perda de pacotes aplicada a cada sinal de voz original.Para a base de dados preparada pelo índice MOS os arquivos degradados foram processados pelo algoritmo da recomendação ITU-T P.862 com o objetivo de determinar o MOS através da comparação do sinal de voz degradado com o sinal original de cada arquivo de áudio e depois agrupados em quatro classes de acordo com o MOS obtido. Para validar os modelos duas bases de dados adicionais foram preparadas contendo arquivos de áudio da base de dados VoxCeleb divididos em quatro classes com 250 arquivos cada, sendo agrupadas pela taxa de PLR e pelo MOS. Os resultados obtidos do modelo utilizando a base de dados preparada pela taxa de perda de pacotes foi de 94% de acurácia na validação e os resultados do modelo para a base de dados preparada pelo MOS foi de 91% de acurácia. O modelo alcançou uma acurácia de 86,96% para a base de dados adicional preparada de acordo com a taxa de perda de pacotes e de 83,29% de acurácia para a base adicional preparada de acordo com o MOS. Para determinar a eficiência do modelo desenvolvido foram comparados os seus resultados com os resultados obtidos pelos algoritmos da recomendações ITU-T P.563 e P.862, onde obteve-se uma média de 53,21% de acerto quando comparamos os resultados da definição do MOS do algoritmo da recomendação ITU-T P.563 com o definido pelo algoritmo da recomendação ITU-T P.862.Através dos resultados obtidos pode-se concluir que os modelos gerados foram capazes de classificar a taxa de perda de pacotes e o índice MOS de forma não intrusiva e com uma ótima taxa de acurácia.Pode-se destacar que quando comparamos os métodos não intrusivos, os resultados obtidos do modelo proposto para o índice MOS que foi de 91% de acurácia foi melhor em comparação dos com os resultados do algoritmo da recomendação ITU-T P.563 que obteve uma taxa 53,21% de acurácia em relação com os resultados do algoritmo intrusivo da recomendação ITU-T P.862.Concluindo assim que o modelo gerado é capaz de determinar o MOS dos arquivos de voz degradados de forma mais eficiente que o algoritmo da recomendação ITU-T P.563. Consequentemente, uma contribuição importante deste trabalho é a apresentação de um modelo de avaliação não intrusivo capaz de identificar a qualidade do sinal de voz em tempo real.Universidade Federal de LavrasPrograma de Pós-graduação em Engenharia de Sistemas e AutomaçãoUFLAbrasilDepartamento de EngenhariaRodríguez, Demóstenes ZegarraRosa, Renata LopesFerreira, Danton DiegoBegazo, Dante CoaquiraCosta, Lucas Hilário da2019-08-23T13:06:17Z2019-08-23T13:06:17Z2019-08-222019-06-26info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfCOSTA, L. H. da. Classificação da qualidade do sinal de voz em comunicação VoIP utilizando Deep Learning. 2019. 94 p. Dissertação (Mestrado em Engenharia de Sistemas e Automação)–Universidade Federal de Lavras, Lavras, 2019.http://repositorio.ufla.br/jspui/handle/1/36412porinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFLAinstname:Universidade Federal de Lavras (UFLA)instacron:UFLA2019-08-23T13:06:18Zoai:localhost:1/36412Repositório InstitucionalPUBhttp://repositorio.ufla.br/oai/requestnivaldo@ufla.br \|\| repositorio.biblioteca@ufla.bropendoar:2019-08-23T13:06:18Repositório Institucional da UFLA - Universidade Federal de Lavras (UFLA)false
dc.title.none.fl_str_mv	Classificação da qualidade do sinal de voz em comunicação VoIP utilizando Deep Learning Voice signal quality rating in VoIP communication using Deep Learning
title	Classificação da qualidade do sinal de voz em comunicação VoIP utilizando Deep Learning
spellingShingle	Classificação da qualidade do sinal de voz em comunicação VoIP utilizando Deep Learning Costa, Lucas Hilário da Voz sobre IP (VoIP) Aprendizado de máquina profundo Qualidade de voz Aprendizado de máquina Recomendação ITU-T P.862 Recomendação ITU-T P.563 Recomendação ITU-T P.501 Voice over IP (VoIP) Deep learning Voice quality Machine learning ITU-T recommendation P.862 ITU-T recommendation P.563 ITU-T recommendation P.501 Voice over Internet Protocol (VoIP) Sistemas de Telecomunicações
title_short	Classificação da qualidade do sinal de voz em comunicação VoIP utilizando Deep Learning
title_full	Classificação da qualidade do sinal de voz em comunicação VoIP utilizando Deep Learning
title_fullStr	Classificação da qualidade do sinal de voz em comunicação VoIP utilizando Deep Learning
title_full_unstemmed	Classificação da qualidade do sinal de voz em comunicação VoIP utilizando Deep Learning
title_sort	Classificação da qualidade do sinal de voz em comunicação VoIP utilizando Deep Learning
author	Costa, Lucas Hilário da
author_facet	Costa, Lucas Hilário da
author_role	author
dc.contributor.none.fl_str_mv	Rodríguez, Demóstenes Zegarra Rosa, Renata Lopes Ferreira, Danton Diego Begazo, Dante Coaquira
dc.contributor.author.fl_str_mv	Costa, Lucas Hilário da
dc.subject.por.fl_str_mv	Voz sobre IP (VoIP) Aprendizado de máquina profundo Qualidade de voz Aprendizado de máquina Recomendação ITU-T P.862 Recomendação ITU-T P.563 Recomendação ITU-T P.501 Voice over IP (VoIP) Deep learning Voice quality Machine learning ITU-T recommendation P.862 ITU-T recommendation P.563 ITU-T recommendation P.501 Voice over Internet Protocol (VoIP) Sistemas de Telecomunicações
topic	Voz sobre IP (VoIP) Aprendizado de máquina profundo Qualidade de voz Aprendizado de máquina Recomendação ITU-T P.862 Recomendação ITU-T P.563 Recomendação ITU-T P.501 Voice over IP (VoIP) Deep learning Voice quality Machine learning ITU-T recommendation P.862 ITU-T recommendation P.563 ITU-T recommendation P.501 Voice over Internet Protocol (VoIP) Sistemas de Telecomunicações
description	Voice over IP (VoIP) is currently one of the most widely used communication services; however, its quality is related to several external factors that cause various types of voice signal degradation. In communication channels, packet loss significantly affects the voice signal, causing lower communication quality, directly affecting the user’s quality of experience (QoE). The objective of this work was the implementation and development of two Deep Learning (DL) network models that are able to classify the quality of the voice signal transmitted in a VoIP communication, mainly affected by packet loss. The proposed models were developed using a Deep Neural Network (DNN) model, where through the analysis of the voice signal affected by Packet Loss Rate (PLR) of the degraded signals, it was possible to classify them into four distinct classes according to the user experience. To perform the tests two databases were prepared, each containing four distinct classes, one of which was prepared with the ITU-T P.862 recommendation database files, with different packet loss rates, and the another base was prepared with the ITU-T P.501 recommendation files according to the mean opinion score (MOS) index of each degraded file. To obtain the databases, a program was implemented in MATLAB that degrades original voice files by changing the packet loss rate values. After processing, the files were grouped into four classes according to the packet loss rate applied to each original voice signal. For the database prepared by the MOS index the degraded files were processed by the ITU-T P.862 recommendation algorithm in order to determine the MOS by comparing the degraded voice signal with the original signal of each audio file and then grouped into four classes according to the MOS obtained. To validate the models two additional databases were prepared containing VoxCeleb database audio files divided into four classes with 250 files each, being grouped by PLR rate and MOS. The results obtained from the model using the database prepared by the packet loss rate was 94% accuracy in the validation and the model results for the database prepared by the MOS was 91% accuracy. The model achieved an accuracy of 86.96% for the additional database prepared according to packet loss rate and 83.29% accuracy for the additional database prepared according to MOS. To determine the efficiency of the developed model, its results were compared with the results obtained by the ITU-T recommendations P.563 and P.862 algorithms, where an average of 53.21% accuracy was obtained when comparing the results. MOS definition of the ITU-T P.563 recommendation algorithm with that defined by the ITU-T P.862 recommendation algorithm. From the obtained results it can be concluded that the generated models were able to classify the packet loss rate and the MOS index in a non intrusive way and with an excellent accuracy rate. It can be highlighted that when comparing the non-intrusive methods, the results obtained from the proposed model for the MOS index which was 91% accuracy was better compared to the results from the ITU-T P.563 recommendation algorithm that obtained an accuracy rate of 53.21% compared to the intrusive algorithm results from the ITU-T P.862 recommendation. Thus, the generated model is able to determine the MOS of the degraded voice files more efficiently than the ITU-T P.563 recommendation algorithm. Consequently, an important contribution of this work is the presentation of a non-intrusive evaluation model capable of identifying the real-time voice signal quality.
publishDate	2019
dc.date.none.fl_str_mv	2019-08-23T13:06:17Z 2019-08-23T13:06:17Z 2019-08-22 2019-06-26
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	COSTA, L. H. da. Classificação da qualidade do sinal de voz em comunicação VoIP utilizando Deep Learning. 2019. 94 p. Dissertação (Mestrado em Engenharia de Sistemas e Automação)–Universidade Federal de Lavras, Lavras, 2019. http://repositorio.ufla.br/jspui/handle/1/36412
identifier_str_mv	COSTA, L. H. da. Classificação da qualidade do sinal de voz em comunicação VoIP utilizando Deep Learning. 2019. 94 p. Dissertação (Mestrado em Engenharia de Sistemas e Automação)–Universidade Federal de Lavras, Lavras, 2019.
url	http://repositorio.ufla.br/jspui/handle/1/36412
dc.language.iso.fl_str_mv	por
language	por
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Universidade Federal de Lavras Programa de Pós-graduação em Engenharia de Sistemas e Automação UFLA brasil Departamento de Engenharia
publisher.none.fl_str_mv	Universidade Federal de Lavras Programa de Pós-graduação em Engenharia de Sistemas e Automação UFLA brasil Departamento de Engenharia
dc.source.none.fl_str_mv	reponame:Repositório Institucional da UFLA instname:Universidade Federal de Lavras (UFLA) instacron:UFLA
instname_str	Universidade Federal de Lavras (UFLA)
instacron_str	UFLA
institution	UFLA
reponame_str	Repositório Institucional da UFLA
collection	Repositório Institucional da UFLA
repository.name.fl_str_mv	Repositório Institucional da UFLA - Universidade Federal de Lavras (UFLA)
repository.mail.fl_str_mv	nivaldo@ufla.br \|\| repositorio.biblioteca@ufla.br
_version_	1815438964308312064

Classificação da qualidade do sinal de voz em comunicação VoIP utilizando Deep Learning

Registros relacionados