Improving the potential of Enhanced Teager Energy Cepstral Coefficients (ETECC) for replay attack detection

Patil, Ankur T.; Acharya, Rajul; Patil, Hemant A.; Guido, Rodrigo Capobianco [UNESP]

Improving the potential of Enhanced Teager Energy Cepstral Coefficients (ETECC) for replay attack detection

Detalhes bibliográficos
Autor(a) principal:	Patil, Ankur T.
Data de Publicação:	2022
Outros Autores:	Acharya, Rajul, Patil, Hemant A., Guido, Rodrigo Capobianco [UNESP]
Tipo de documento:	Artigo
Idioma:	eng
Título da fonte:	Repositório Institucional da UNESP
Texto Completo:	http://dx.doi.org/10.1016/j.csl.2021.101281 http://hdl.handle.net/11449/233527
Resumo:	In the scope of voice biometrics, the term replay attack, (RA) refers to the dishonest attempt made by an impostor to spoof someone else's identity by replaying the subject's previously recorded speech close to the Automatic Speaker Verification (ASV) system under attack. State-of-the-art strategies for RA detection, such as the Enhanced Teager Energy Cepstral Coefficients (ETECC), have shown promising results due to their precision in measuring energy from high frequency components of speech, as a function of two recently defined concepts: signal mass and Enhanced Teager Energy Operator (ETEO). Nevertheless, since the replay mechanism prominently deteriorates the speech signal spectrum just in those spectral zones, we propose the association of ETEO with different strategies to further improve the previous results in getting effective countermeasures against RAs. Specifically, comprehensive evaluations which include a detailed mathematical analysis, a simulation on amplitude and frequency modulated (AM–FM) signals, and a spectrographic inspection involving different filterbank structures, along with their experimental results, are provided in this paper. In addition, ETEO-derived features are contrasted to existing feature sets by using Paraconsistent Feature Engineering (PFE) for feature ranking, expanding our previously published results. Lastly, experiments are performed with ASVSpoof-2017 version 2.0 dataset, Realistic Replay Attack Microphone Array Speech Corpus (ReMASC), BTAS-2016, dataset, ASVSpoof-2019 challenge dataset, and ASVSpoof-2015 challenge dataset, considering Gaussian Mixture Models (GMMs), Convolutional Neural Networks (CNNs) and Light-CNN architectures as being the classifiers. The standalone ETECC-GMM system showed the best performance by producing equal error rates (EERs) of 5.55% and 10.75% on development and evaluation sets, respectively.

Metadados do item

id	UNSP_051de06c71b8b5e3dbba27f87342e8f0
oai_identifier_str	oai:repositorio.unesp.br:11449/233527
network_acronym_str	UNSP
network_name_str	Repositório Institucional da UNESP
repository_id_str	2946
spelling	Improving the potential of Enhanced Teager Energy Cepstral Coefficients (ETECC) for replay attack detectionAutomatic speaker verification (ASV)Enhanced Teager Energy Cepstral Coefficients (ETECCs)Enhanced Teager Energy Operator (ETEO)Handcrafted featuresParaconsistent Feature Engineering (PFE)Replay attacks (RAs)In the scope of voice biometrics, the term replay attack, (RA) refers to the dishonest attempt made by an impostor to spoof someone else's identity by replaying the subject's previously recorded speech close to the Automatic Speaker Verification (ASV) system under attack. State-of-the-art strategies for RA detection, such as the Enhanced Teager Energy Cepstral Coefficients (ETECC), have shown promising results due to their precision in measuring energy from high frequency components of speech, as a function of two recently defined concepts: signal mass and Enhanced Teager Energy Operator (ETEO). Nevertheless, since the replay mechanism prominently deteriorates the speech signal spectrum just in those spectral zones, we propose the association of ETEO with different strategies to further improve the previous results in getting effective countermeasures against RAs. Specifically, comprehensive evaluations which include a detailed mathematical analysis, a simulation on amplitude and frequency modulated (AM–FM) signals, and a spectrographic inspection involving different filterbank structures, along with their experimental results, are provided in this paper. In addition, ETEO-derived features are contrasted to existing feature sets by using Paraconsistent Feature Engineering (PFE) for feature ranking, expanding our previously published results. Lastly, experiments are performed with ASVSpoof-2017 version 2.0 dataset, Realistic Replay Attack Microphone Array Speech Corpus (ReMASC), BTAS-2016, dataset, ASVSpoof-2019 challenge dataset, and ASVSpoof-2015 challenge dataset, considering Gaussian Mixture Models (GMMs), Convolutional Neural Networks (CNNs) and Light-CNN architectures as being the classifiers. The standalone ETECC-GMM system showed the best performance by producing equal error rates (EERs) of 5.55% and 10.75% on development and evaluation sets, respectively.Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)Speech Research Lab Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT)Instituto de Biociências Letras e Ciências Exatas Unesp - Univ Estadual Paulista (São Paulo State University), Rua Cristóvão Colombo 2265, Jd NazarethInstituto de Biociências Letras e Ciências Exatas Unesp - Univ Estadual Paulista (São Paulo State University), Rua Cristóvão Colombo 2265, Jd NazarethFAPESP: 2019/04475-0FAPESP: 306808/2018-8Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT)Universidade Estadual Paulista (UNESP)Patil, Ankur T.Acharya, RajulPatil, Hemant A.Guido, Rodrigo Capobianco [UNESP]2022-05-01T09:00:56Z2022-05-01T09:00:56Z2022-03-01info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttp://dx.doi.org/10.1016/j.csl.2021.101281Computer Speech and Language, v. 72.1095-83630885-2308http://hdl.handle.net/11449/23352710.1016/j.csl.2021.1012812-s2.0-85114778313Scopusreponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESPengComputer Speech and Languageinfo:eu-repo/semantics/openAccess2022-05-01T09:00:56Zoai:repositorio.unesp.br:11449/233527Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestopendoar:29462024-08-05T15:58:38.802562Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false
dc.title.none.fl_str_mv	Improving the potential of Enhanced Teager Energy Cepstral Coefficients (ETECC) for replay attack detection
title	Improving the potential of Enhanced Teager Energy Cepstral Coefficients (ETECC) for replay attack detection
spellingShingle	Improving the potential of Enhanced Teager Energy Cepstral Coefficients (ETECC) for replay attack detection Patil, Ankur T. Automatic speaker verification (ASV) Enhanced Teager Energy Cepstral Coefficients (ETECCs) Enhanced Teager Energy Operator (ETEO) Handcrafted features Paraconsistent Feature Engineering (PFE) Replay attacks (RAs)
title_short	Improving the potential of Enhanced Teager Energy Cepstral Coefficients (ETECC) for replay attack detection
title_full	Improving the potential of Enhanced Teager Energy Cepstral Coefficients (ETECC) for replay attack detection
title_fullStr	Improving the potential of Enhanced Teager Energy Cepstral Coefficients (ETECC) for replay attack detection
title_full_unstemmed	Improving the potential of Enhanced Teager Energy Cepstral Coefficients (ETECC) for replay attack detection
title_sort	Improving the potential of Enhanced Teager Energy Cepstral Coefficients (ETECC) for replay attack detection
author	Patil, Ankur T.
author_facet	Patil, Ankur T. Acharya, Rajul Patil, Hemant A. Guido, Rodrigo Capobianco [UNESP]
author_role	author
author2	Acharya, Rajul Patil, Hemant A. Guido, Rodrigo Capobianco [UNESP]
author2_role	author author author
dc.contributor.none.fl_str_mv	Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT) Universidade Estadual Paulista (UNESP)
dc.contributor.author.fl_str_mv	Patil, Ankur T. Acharya, Rajul Patil, Hemant A. Guido, Rodrigo Capobianco [UNESP]
dc.subject.por.fl_str_mv	Automatic speaker verification (ASV) Enhanced Teager Energy Cepstral Coefficients (ETECCs) Enhanced Teager Energy Operator (ETEO) Handcrafted features Paraconsistent Feature Engineering (PFE) Replay attacks (RAs)
topic	Automatic speaker verification (ASV) Enhanced Teager Energy Cepstral Coefficients (ETECCs) Enhanced Teager Energy Operator (ETEO) Handcrafted features Paraconsistent Feature Engineering (PFE) Replay attacks (RAs)
description	In the scope of voice biometrics, the term replay attack, (RA) refers to the dishonest attempt made by an impostor to spoof someone else's identity by replaying the subject's previously recorded speech close to the Automatic Speaker Verification (ASV) system under attack. State-of-the-art strategies for RA detection, such as the Enhanced Teager Energy Cepstral Coefficients (ETECC), have shown promising results due to their precision in measuring energy from high frequency components of speech, as a function of two recently defined concepts: signal mass and Enhanced Teager Energy Operator (ETEO). Nevertheless, since the replay mechanism prominently deteriorates the speech signal spectrum just in those spectral zones, we propose the association of ETEO with different strategies to further improve the previous results in getting effective countermeasures against RAs. Specifically, comprehensive evaluations which include a detailed mathematical analysis, a simulation on amplitude and frequency modulated (AM–FM) signals, and a spectrographic inspection involving different filterbank structures, along with their experimental results, are provided in this paper. In addition, ETEO-derived features are contrasted to existing feature sets by using Paraconsistent Feature Engineering (PFE) for feature ranking, expanding our previously published results. Lastly, experiments are performed with ASVSpoof-2017 version 2.0 dataset, Realistic Replay Attack Microphone Array Speech Corpus (ReMASC), BTAS-2016, dataset, ASVSpoof-2019 challenge dataset, and ASVSpoof-2015 challenge dataset, considering Gaussian Mixture Models (GMMs), Convolutional Neural Networks (CNNs) and Light-CNN architectures as being the classifiers. The standalone ETECC-GMM system showed the best performance by producing equal error rates (EERs) of 5.55% and 10.75% on development and evaluation sets, respectively.
publishDate	2022
dc.date.none.fl_str_mv	2022-05-01T09:00:56Z 2022-05-01T09:00:56Z 2022-03-01
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://dx.doi.org/10.1016/j.csl.2021.101281 Computer Speech and Language, v. 72. 1095-8363 0885-2308 http://hdl.handle.net/11449/233527 10.1016/j.csl.2021.101281 2-s2.0-85114778313
url	http://dx.doi.org/10.1016/j.csl.2021.101281 http://hdl.handle.net/11449/233527
identifier_str_mv	Computer Speech and Language, v. 72. 1095-8363 0885-2308 10.1016/j.csl.2021.101281 2-s2.0-85114778313
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	Computer Speech and Language
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.source.none.fl_str_mv	Scopus reponame:Repositório Institucional da UNESP instname:Universidade Estadual Paulista (UNESP) instacron:UNESP
instname_str	Universidade Estadual Paulista (UNESP)
instacron_str	UNESP
institution	UNESP
reponame_str	Repositório Institucional da UNESP
collection	Repositório Institucional da UNESP
repository.name.fl_str_mv	Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)
repository.mail.fl_str_mv
_version_	1808128588955779072

Improving the potential of Enhanced Teager Energy Cepstral Coefficients (ETECC) for replay attack detection

Registros relacionados