Improving the potential of Enhanced Teager Energy Cepstral Coefficients (ETECC) for replay attack detection

Detalhes bibliográficos
Autor(a) principal: Patil, Ankur T.
Data de Publicação: 2022
Outros Autores: Acharya, Rajul, Patil, Hemant A., Guido, Rodrigo Capobianco [UNESP]
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Institucional da UNESP
Texto Completo: http://dx.doi.org/10.1016/j.csl.2021.101281
http://hdl.handle.net/11449/233527
Resumo: In the scope of voice biometrics, the term replay attack, (RA) refers to the dishonest attempt made by an impostor to spoof someone else's identity by replaying the subject's previously recorded speech close to the Automatic Speaker Verification (ASV) system under attack. State-of-the-art strategies for RA detection, such as the Enhanced Teager Energy Cepstral Coefficients (ETECC), have shown promising results due to their precision in measuring energy from high frequency components of speech, as a function of two recently defined concepts: signal mass and Enhanced Teager Energy Operator (ETEO). Nevertheless, since the replay mechanism prominently deteriorates the speech signal spectrum just in those spectral zones, we propose the association of ETEO with different strategies to further improve the previous results in getting effective countermeasures against RAs. Specifically, comprehensive evaluations which include a detailed mathematical analysis, a simulation on amplitude and frequency modulated (AM–FM) signals, and a spectrographic inspection involving different filterbank structures, along with their experimental results, are provided in this paper. In addition, ETEO-derived features are contrasted to existing feature sets by using Paraconsistent Feature Engineering (PFE) for feature ranking, expanding our previously published results. Lastly, experiments are performed with ASVSpoof-2017 version 2.0 dataset, Realistic Replay Attack Microphone Array Speech Corpus (ReMASC), BTAS-2016, dataset, ASVSpoof-2019 challenge dataset, and ASVSpoof-2015 challenge dataset, considering Gaussian Mixture Models (GMMs), Convolutional Neural Networks (CNNs) and Light-CNN architectures as being the classifiers. The standalone ETECC-GMM system showed the best performance by producing equal error rates (EERs) of 5.55% and 10.75% on development and evaluation sets, respectively.
id UNSP_051de06c71b8b5e3dbba27f87342e8f0
oai_identifier_str oai:repositorio.unesp.br:11449/233527
network_acronym_str UNSP
network_name_str Repositório Institucional da UNESP
repository_id_str 2946
spelling Improving the potential of Enhanced Teager Energy Cepstral Coefficients (ETECC) for replay attack detectionAutomatic speaker verification (ASV)Enhanced Teager Energy Cepstral Coefficients (ETECCs)Enhanced Teager Energy Operator (ETEO)Handcrafted featuresParaconsistent Feature Engineering (PFE)Replay attacks (RAs)In the scope of voice biometrics, the term replay attack, (RA) refers to the dishonest attempt made by an impostor to spoof someone else's identity by replaying the subject's previously recorded speech close to the Automatic Speaker Verification (ASV) system under attack. State-of-the-art strategies for RA detection, such as the Enhanced Teager Energy Cepstral Coefficients (ETECC), have shown promising results due to their precision in measuring energy from high frequency components of speech, as a function of two recently defined concepts: signal mass and Enhanced Teager Energy Operator (ETEO). Nevertheless, since the replay mechanism prominently deteriorates the speech signal spectrum just in those spectral zones, we propose the association of ETEO with different strategies to further improve the previous results in getting effective countermeasures against RAs. Specifically, comprehensive evaluations which include a detailed mathematical analysis, a simulation on amplitude and frequency modulated (AM–FM) signals, and a spectrographic inspection involving different filterbank structures, along with their experimental results, are provided in this paper. In addition, ETEO-derived features are contrasted to existing feature sets by using Paraconsistent Feature Engineering (PFE) for feature ranking, expanding our previously published results. Lastly, experiments are performed with ASVSpoof-2017 version 2.0 dataset, Realistic Replay Attack Microphone Array Speech Corpus (ReMASC), BTAS-2016, dataset, ASVSpoof-2019 challenge dataset, and ASVSpoof-2015 challenge dataset, considering Gaussian Mixture Models (GMMs), Convolutional Neural Networks (CNNs) and Light-CNN architectures as being the classifiers. The standalone ETECC-GMM system showed the best performance by producing equal error rates (EERs) of 5.55% and 10.75% on development and evaluation sets, respectively.Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)Speech Research Lab Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT)Instituto de Biociências Letras e Ciências Exatas Unesp - Univ Estadual Paulista (São Paulo State University), Rua Cristóvão Colombo 2265, Jd NazarethInstituto de Biociências Letras e Ciências Exatas Unesp - Univ Estadual Paulista (São Paulo State University), Rua Cristóvão Colombo 2265, Jd NazarethFAPESP: 2019/04475-0FAPESP: 306808/2018-8Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT)Universidade Estadual Paulista (UNESP)Patil, Ankur T.Acharya, RajulPatil, Hemant A.Guido, Rodrigo Capobianco [UNESP]2022-05-01T09:00:56Z2022-05-01T09:00:56Z2022-03-01info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttp://dx.doi.org/10.1016/j.csl.2021.101281Computer Speech and Language, v. 72.1095-83630885-2308http://hdl.handle.net/11449/23352710.1016/j.csl.2021.1012812-s2.0-85114778313Scopusreponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESPengComputer Speech and Languageinfo:eu-repo/semantics/openAccess2022-05-01T09:00:56Zoai:repositorio.unesp.br:11449/233527Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestopendoar:29462022-05-01T09:00:56Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false
dc.title.none.fl_str_mv Improving the potential of Enhanced Teager Energy Cepstral Coefficients (ETECC) for replay attack detection
title Improving the potential of Enhanced Teager Energy Cepstral Coefficients (ETECC) for replay attack detection
spellingShingle Improving the potential of Enhanced Teager Energy Cepstral Coefficients (ETECC) for replay attack detection
Patil, Ankur T.
Automatic speaker verification (ASV)
Enhanced Teager Energy Cepstral Coefficients (ETECCs)
Enhanced Teager Energy Operator (ETEO)
Handcrafted features
Paraconsistent Feature Engineering (PFE)
Replay attacks (RAs)
title_short Improving the potential of Enhanced Teager Energy Cepstral Coefficients (ETECC) for replay attack detection
title_full Improving the potential of Enhanced Teager Energy Cepstral Coefficients (ETECC) for replay attack detection
title_fullStr Improving the potential of Enhanced Teager Energy Cepstral Coefficients (ETECC) for replay attack detection
title_full_unstemmed Improving the potential of Enhanced Teager Energy Cepstral Coefficients (ETECC) for replay attack detection
title_sort Improving the potential of Enhanced Teager Energy Cepstral Coefficients (ETECC) for replay attack detection
author Patil, Ankur T.
author_facet Patil, Ankur T.
Acharya, Rajul
Patil, Hemant A.
Guido, Rodrigo Capobianco [UNESP]
author_role author
author2 Acharya, Rajul
Patil, Hemant A.
Guido, Rodrigo Capobianco [UNESP]
author2_role author
author
author
dc.contributor.none.fl_str_mv Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT)
Universidade Estadual Paulista (UNESP)
dc.contributor.author.fl_str_mv Patil, Ankur T.
Acharya, Rajul
Patil, Hemant A.
Guido, Rodrigo Capobianco [UNESP]
dc.subject.por.fl_str_mv Automatic speaker verification (ASV)
Enhanced Teager Energy Cepstral Coefficients (ETECCs)
Enhanced Teager Energy Operator (ETEO)
Handcrafted features
Paraconsistent Feature Engineering (PFE)
Replay attacks (RAs)
topic Automatic speaker verification (ASV)
Enhanced Teager Energy Cepstral Coefficients (ETECCs)
Enhanced Teager Energy Operator (ETEO)
Handcrafted features
Paraconsistent Feature Engineering (PFE)
Replay attacks (RAs)
description In the scope of voice biometrics, the term replay attack, (RA) refers to the dishonest attempt made by an impostor to spoof someone else's identity by replaying the subject's previously recorded speech close to the Automatic Speaker Verification (ASV) system under attack. State-of-the-art strategies for RA detection, such as the Enhanced Teager Energy Cepstral Coefficients (ETECC), have shown promising results due to their precision in measuring energy from high frequency components of speech, as a function of two recently defined concepts: signal mass and Enhanced Teager Energy Operator (ETEO). Nevertheless, since the replay mechanism prominently deteriorates the speech signal spectrum just in those spectral zones, we propose the association of ETEO with different strategies to further improve the previous results in getting effective countermeasures against RAs. Specifically, comprehensive evaluations which include a detailed mathematical analysis, a simulation on amplitude and frequency modulated (AM–FM) signals, and a spectrographic inspection involving different filterbank structures, along with their experimental results, are provided in this paper. In addition, ETEO-derived features are contrasted to existing feature sets by using Paraconsistent Feature Engineering (PFE) for feature ranking, expanding our previously published results. Lastly, experiments are performed with ASVSpoof-2017 version 2.0 dataset, Realistic Replay Attack Microphone Array Speech Corpus (ReMASC), BTAS-2016, dataset, ASVSpoof-2019 challenge dataset, and ASVSpoof-2015 challenge dataset, considering Gaussian Mixture Models (GMMs), Convolutional Neural Networks (CNNs) and Light-CNN architectures as being the classifiers. The standalone ETECC-GMM system showed the best performance by producing equal error rates (EERs) of 5.55% and 10.75% on development and evaluation sets, respectively.
publishDate 2022
dc.date.none.fl_str_mv 2022-05-01T09:00:56Z
2022-05-01T09:00:56Z
2022-03-01
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://dx.doi.org/10.1016/j.csl.2021.101281
Computer Speech and Language, v. 72.
1095-8363
0885-2308
http://hdl.handle.net/11449/233527
10.1016/j.csl.2021.101281
2-s2.0-85114778313
url http://dx.doi.org/10.1016/j.csl.2021.101281
http://hdl.handle.net/11449/233527
identifier_str_mv Computer Speech and Language, v. 72.
1095-8363
0885-2308
10.1016/j.csl.2021.101281
2-s2.0-85114778313
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv Computer Speech and Language
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.source.none.fl_str_mv Scopus
reponame:Repositório Institucional da UNESP
instname:Universidade Estadual Paulista (UNESP)
instacron:UNESP
instname_str Universidade Estadual Paulista (UNESP)
instacron_str UNESP
institution UNESP
reponame_str Repositório Institucional da UNESP
collection Repositório Institucional da UNESP
repository.name.fl_str_mv Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)
repository.mail.fl_str_mv
_version_ 1803649523264782336