Learning Streamed Attention Network from Descriptor Images for Cross-Resolution 3D Face Recognition

Detalhes bibliográficos
Autor(a) principal: Neto, João Baptista Cardia
Data de Publicação: 2023
Outros Autores: Ferrari, Claudio, Marana, Aparecido Nilceu [UNESP], Berretti, Stefano, Del Bimbo, Alberto
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Institucional da UNESP
Texto Completo: http://dx.doi.org/10.1145/3527158
http://hdl.handle.net/11449/249658
Resumo: In this article, we propose a hybrid framework for cross-resolution 3D face recognition which utilizes a Streamed Attention Network (SAN) that combines handcrafted features with Convolutional Neural Networks (CNNs). It consists of two main stages: first, we process the depth images to extract low-level surface descriptors and derive the corresponding Descriptor Images (DIs), represented as four-channel images. To build the DIs, we propose a variation of the 3D Local Binary Pattern (3DLBP) operator that encodes depth differences using a sigmoid function. Then, we design a CNN that learns from these DIs. The peculiarity of our solution consists in processing each channel of the input image separately, and fusing the contribution of each channel by means of both self- and cross-attention mechanisms. This strategy showed two main advantages over the direct application of Deep-CNN to depth images of the face; on the one hand, the DIs can reduce the diversity between high- and low-resolution data by encoding surface properties that are robust to resolution differences. On the other, it allows a better exploitation of the richer information provided by low-level features, resulting in improved recognition. We evaluated the proposed architecture in a challenging cross-dataset, cross-resolution scenario. To this aim, we first train the network on scanner-resolution 3D data. Next, we utilize the pre-trained network as feature extractor on low-resolution data, where the output of the last fully connected layer is used as face descriptor. Other than standard benchmarks, we also perform experiments on a newly collected dataset of paired high- and low-resolution 3D faces. We use the high-resolution data as gallery, while low-resolution faces are used as probe, allowing us to assess the real gap existing between these two types of data. Extensive experiments on low-resolution 3D face benchmarks show promising results with respect to state-of-the-art methods.
id UNSP_048b60564b3d15a078da633e94eeb4e7
oai_identifier_str oai:repositorio.unesp.br:11449/249658
network_acronym_str UNSP
network_name_str Repositório Institucional da UNESP
repository_id_str 2946
spelling Learning Streamed Attention Network from Descriptor Images for Cross-Resolution 3D Face Recognition3D face recognitionconvolutional neural networksfeature descriptorsself- and cross-attentionIn this article, we propose a hybrid framework for cross-resolution 3D face recognition which utilizes a Streamed Attention Network (SAN) that combines handcrafted features with Convolutional Neural Networks (CNNs). It consists of two main stages: first, we process the depth images to extract low-level surface descriptors and derive the corresponding Descriptor Images (DIs), represented as four-channel images. To build the DIs, we propose a variation of the 3D Local Binary Pattern (3DLBP) operator that encodes depth differences using a sigmoid function. Then, we design a CNN that learns from these DIs. The peculiarity of our solution consists in processing each channel of the input image separately, and fusing the contribution of each channel by means of both self- and cross-attention mechanisms. This strategy showed two main advantages over the direct application of Deep-CNN to depth images of the face; on the one hand, the DIs can reduce the diversity between high- and low-resolution data by encoding surface properties that are robust to resolution differences. On the other, it allows a better exploitation of the richer information provided by low-level features, resulting in improved recognition. We evaluated the proposed architecture in a challenging cross-dataset, cross-resolution scenario. To this aim, we first train the network on scanner-resolution 3D data. Next, we utilize the pre-trained network as feature extractor on low-resolution data, where the output of the last fully connected layer is used as face descriptor. Other than standard benchmarks, we also perform experiments on a newly collected dataset of paired high- and low-resolution 3D faces. We use the high-resolution data as gallery, while low-resolution faces are used as probe, allowing us to assess the real gap existing between these two types of data. Extensive experiments on low-resolution 3D face benchmarks show promising results with respect to state-of-the-art methods.São Paulo State Technological College (FATEC), Rua Maranhão, 898, CatanduvaDepartment of Architecture and Engineering University of Parma, Parco Area delle Scienze, 181/ARecogna Laboratory São Paulo State University (UNESP), Av. Eng. Luís Edmundo Carrijo CoubeMicc University of Florence, Viale Giovanni Battista Morgagni, 65Recogna Laboratory São Paulo State University (UNESP), Av. Eng. Luís Edmundo Carrijo CoubeSão Paulo State Technological College (FATEC)University of ParmaUniversidade Estadual Paulista (UNESP)University of FlorenceNeto, João Baptista CardiaFerrari, ClaudioMarana, Aparecido Nilceu [UNESP]Berretti, StefanoDel Bimbo, Alberto2023-07-29T16:05:41Z2023-07-29T16:05:41Z2023-01-23info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttp://dx.doi.org/10.1145/3527158ACM Transactions on Multimedia Computing, Communications and Applications, v. 19, n. 1, 2023.1551-68651551-6857http://hdl.handle.net/11449/24965810.1145/35271582-s2.0-85148038627Scopusreponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESPengACM Transactions on Multimedia Computing, Communications and Applicationsinfo:eu-repo/semantics/openAccess2024-04-23T16:10:47Zoai:repositorio.unesp.br:11449/249658Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestopendoar:29462024-04-23T16:10:47Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false
dc.title.none.fl_str_mv Learning Streamed Attention Network from Descriptor Images for Cross-Resolution 3D Face Recognition
title Learning Streamed Attention Network from Descriptor Images for Cross-Resolution 3D Face Recognition
spellingShingle Learning Streamed Attention Network from Descriptor Images for Cross-Resolution 3D Face Recognition
Neto, João Baptista Cardia
3D face recognition
convolutional neural networks
feature descriptors
self- and cross-attention
title_short Learning Streamed Attention Network from Descriptor Images for Cross-Resolution 3D Face Recognition
title_full Learning Streamed Attention Network from Descriptor Images for Cross-Resolution 3D Face Recognition
title_fullStr Learning Streamed Attention Network from Descriptor Images for Cross-Resolution 3D Face Recognition
title_full_unstemmed Learning Streamed Attention Network from Descriptor Images for Cross-Resolution 3D Face Recognition
title_sort Learning Streamed Attention Network from Descriptor Images for Cross-Resolution 3D Face Recognition
author Neto, João Baptista Cardia
author_facet Neto, João Baptista Cardia
Ferrari, Claudio
Marana, Aparecido Nilceu [UNESP]
Berretti, Stefano
Del Bimbo, Alberto
author_role author
author2 Ferrari, Claudio
Marana, Aparecido Nilceu [UNESP]
Berretti, Stefano
Del Bimbo, Alberto
author2_role author
author
author
author
dc.contributor.none.fl_str_mv São Paulo State Technological College (FATEC)
University of Parma
Universidade Estadual Paulista (UNESP)
University of Florence
dc.contributor.author.fl_str_mv Neto, João Baptista Cardia
Ferrari, Claudio
Marana, Aparecido Nilceu [UNESP]
Berretti, Stefano
Del Bimbo, Alberto
dc.subject.por.fl_str_mv 3D face recognition
convolutional neural networks
feature descriptors
self- and cross-attention
topic 3D face recognition
convolutional neural networks
feature descriptors
self- and cross-attention
description In this article, we propose a hybrid framework for cross-resolution 3D face recognition which utilizes a Streamed Attention Network (SAN) that combines handcrafted features with Convolutional Neural Networks (CNNs). It consists of two main stages: first, we process the depth images to extract low-level surface descriptors and derive the corresponding Descriptor Images (DIs), represented as four-channel images. To build the DIs, we propose a variation of the 3D Local Binary Pattern (3DLBP) operator that encodes depth differences using a sigmoid function. Then, we design a CNN that learns from these DIs. The peculiarity of our solution consists in processing each channel of the input image separately, and fusing the contribution of each channel by means of both self- and cross-attention mechanisms. This strategy showed two main advantages over the direct application of Deep-CNN to depth images of the face; on the one hand, the DIs can reduce the diversity between high- and low-resolution data by encoding surface properties that are robust to resolution differences. On the other, it allows a better exploitation of the richer information provided by low-level features, resulting in improved recognition. We evaluated the proposed architecture in a challenging cross-dataset, cross-resolution scenario. To this aim, we first train the network on scanner-resolution 3D data. Next, we utilize the pre-trained network as feature extractor on low-resolution data, where the output of the last fully connected layer is used as face descriptor. Other than standard benchmarks, we also perform experiments on a newly collected dataset of paired high- and low-resolution 3D faces. We use the high-resolution data as gallery, while low-resolution faces are used as probe, allowing us to assess the real gap existing between these two types of data. Extensive experiments on low-resolution 3D face benchmarks show promising results with respect to state-of-the-art methods.
publishDate 2023
dc.date.none.fl_str_mv 2023-07-29T16:05:41Z
2023-07-29T16:05:41Z
2023-01-23
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://dx.doi.org/10.1145/3527158
ACM Transactions on Multimedia Computing, Communications and Applications, v. 19, n. 1, 2023.
1551-6865
1551-6857
http://hdl.handle.net/11449/249658
10.1145/3527158
2-s2.0-85148038627
url http://dx.doi.org/10.1145/3527158
http://hdl.handle.net/11449/249658
identifier_str_mv ACM Transactions on Multimedia Computing, Communications and Applications, v. 19, n. 1, 2023.
1551-6865
1551-6857
10.1145/3527158
2-s2.0-85148038627
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv ACM Transactions on Multimedia Computing, Communications and Applications
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.source.none.fl_str_mv Scopus
reponame:Repositório Institucional da UNESP
instname:Universidade Estadual Paulista (UNESP)
instacron:UNESP
instname_str Universidade Estadual Paulista (UNESP)
instacron_str UNESP
institution UNESP
reponame_str Repositório Institucional da UNESP
collection Repositório Institucional da UNESP
repository.name.fl_str_mv Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)
repository.mail.fl_str_mv
_version_ 1799964888767922176