Learning Streamed Attention Network from Descriptor Images for Cross-Resolution 3D Face Recognition

Neto, João Baptista Cardia; Ferrari, Claudio; Marana, Aparecido Nilceu [UNESP]; Berretti, Stefano; Del Bimbo, Alberto

Learning Streamed Attention Network from Descriptor Images for Cross-Resolution 3D Face Recognition

Detalhes bibliográficos
Autor(a) principal:	Neto, João Baptista Cardia
Data de Publicação:	2023
Outros Autores:	Ferrari, Claudio, Marana, Aparecido Nilceu [UNESP], Berretti, Stefano, Del Bimbo, Alberto
Tipo de documento:	Artigo
Idioma:	eng
Título da fonte:	Repositório Institucional da UNESP
Texto Completo:	http://dx.doi.org/10.1145/3527158 http://hdl.handle.net/11449/249658
Resumo:	In this article, we propose a hybrid framework for cross-resolution 3D face recognition which utilizes a Streamed Attention Network (SAN) that combines handcrafted features with Convolutional Neural Networks (CNNs). It consists of two main stages: first, we process the depth images to extract low-level surface descriptors and derive the corresponding Descriptor Images (DIs), represented as four-channel images. To build the DIs, we propose a variation of the 3D Local Binary Pattern (3DLBP) operator that encodes depth differences using a sigmoid function. Then, we design a CNN that learns from these DIs. The peculiarity of our solution consists in processing each channel of the input image separately, and fusing the contribution of each channel by means of both self- and cross-attention mechanisms. This strategy showed two main advantages over the direct application of Deep-CNN to depth images of the face; on the one hand, the DIs can reduce the diversity between high- and low-resolution data by encoding surface properties that are robust to resolution differences. On the other, it allows a better exploitation of the richer information provided by low-level features, resulting in improved recognition. We evaluated the proposed architecture in a challenging cross-dataset, cross-resolution scenario. To this aim, we first train the network on scanner-resolution 3D data. Next, we utilize the pre-trained network as feature extractor on low-resolution data, where the output of the last fully connected layer is used as face descriptor. Other than standard benchmarks, we also perform experiments on a newly collected dataset of paired high- and low-resolution 3D faces. We use the high-resolution data as gallery, while low-resolution faces are used as probe, allowing us to assess the real gap existing between these two types of data. Extensive experiments on low-resolution 3D face benchmarks show promising results with respect to state-of-the-art methods.

Metadados do item

id	UNSP_048b60564b3d15a078da633e94eeb4e7
oai_identifier_str	oai:repositorio.unesp.br:11449/249658
network_acronym_str	UNSP
network_name_str	Repositório Institucional da UNESP
repository_id_str	2946
spelling	Learning Streamed Attention Network from Descriptor Images for Cross-Resolution 3D Face Recognition3D face recognitionconvolutional neural networksfeature descriptorsself- and cross-attentionIn this article, we propose a hybrid framework for cross-resolution 3D face recognition which utilizes a Streamed Attention Network (SAN) that combines handcrafted features with Convolutional Neural Networks (CNNs). It consists of two main stages: first, we process the depth images to extract low-level surface descriptors and derive the corresponding Descriptor Images (DIs), represented as four-channel images. To build the DIs, we propose a variation of the 3D Local Binary Pattern (3DLBP) operator that encodes depth differences using a sigmoid function. Then, we design a CNN that learns from these DIs. The peculiarity of our solution consists in processing each channel of the input image separately, and fusing the contribution of each channel by means of both self- and cross-attention mechanisms. This strategy showed two main advantages over the direct application of Deep-CNN to depth images of the face; on the one hand, the DIs can reduce the diversity between high- and low-resolution data by encoding surface properties that are robust to resolution differences. On the other, it allows a better exploitation of the richer information provided by low-level features, resulting in improved recognition. We evaluated the proposed architecture in a challenging cross-dataset, cross-resolution scenario. To this aim, we first train the network on scanner-resolution 3D data. Next, we utilize the pre-trained network as feature extractor on low-resolution data, where the output of the last fully connected layer is used as face descriptor. Other than standard benchmarks, we also perform experiments on a newly collected dataset of paired high- and low-resolution 3D faces. We use the high-resolution data as gallery, while low-resolution faces are used as probe, allowing us to assess the real gap existing between these two types of data. Extensive experiments on low-resolution 3D face benchmarks show promising results with respect to state-of-the-art methods.São Paulo State Technological College (FATEC), Rua Maranhão, 898, CatanduvaDepartment of Architecture and Engineering University of Parma, Parco Area delle Scienze, 181/ARecogna Laboratory São Paulo State University (UNESP), Av. Eng. Luís Edmundo Carrijo CoubeMicc University of Florence, Viale Giovanni Battista Morgagni, 65Recogna Laboratory São Paulo State University (UNESP), Av. Eng. Luís Edmundo Carrijo CoubeSão Paulo State Technological College (FATEC)University of ParmaUniversidade Estadual Paulista (UNESP)University of FlorenceNeto, João Baptista CardiaFerrari, ClaudioMarana, Aparecido Nilceu [UNESP]Berretti, StefanoDel Bimbo, Alberto2023-07-29T16:05:41Z2023-07-29T16:05:41Z2023-01-23info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttp://dx.doi.org/10.1145/3527158ACM Transactions on Multimedia Computing, Communications and Applications, v. 19, n. 1, 2023.1551-68651551-6857http://hdl.handle.net/11449/24965810.1145/35271582-s2.0-85148038627Scopusreponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESPengACM Transactions on Multimedia Computing, Communications and Applicationsinfo:eu-repo/semantics/openAccess2024-04-23T16:10:47Zoai:repositorio.unesp.br:11449/249658Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestopendoar:29462024-04-23T16:10:47Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false
dc.title.none.fl_str_mv	Learning Streamed Attention Network from Descriptor Images for Cross-Resolution 3D Face Recognition
title	Learning Streamed Attention Network from Descriptor Images for Cross-Resolution 3D Face Recognition
spellingShingle	Learning Streamed Attention Network from Descriptor Images for Cross-Resolution 3D Face Recognition Neto, João Baptista Cardia 3D face recognition convolutional neural networks feature descriptors self- and cross-attention
title_short	Learning Streamed Attention Network from Descriptor Images for Cross-Resolution 3D Face Recognition
title_full	Learning Streamed Attention Network from Descriptor Images for Cross-Resolution 3D Face Recognition
title_fullStr	Learning Streamed Attention Network from Descriptor Images for Cross-Resolution 3D Face Recognition
title_full_unstemmed	Learning Streamed Attention Network from Descriptor Images for Cross-Resolution 3D Face Recognition
title_sort	Learning Streamed Attention Network from Descriptor Images for Cross-Resolution 3D Face Recognition
author	Neto, João Baptista Cardia
author_facet	Neto, João Baptista Cardia Ferrari, Claudio Marana, Aparecido Nilceu [UNESP] Berretti, Stefano Del Bimbo, Alberto
author_role	author
author2	Ferrari, Claudio Marana, Aparecido Nilceu [UNESP] Berretti, Stefano Del Bimbo, Alberto
author2_role	author author author author
dc.contributor.none.fl_str_mv	São Paulo State Technological College (FATEC) University of Parma Universidade Estadual Paulista (UNESP) University of Florence
dc.contributor.author.fl_str_mv	Neto, João Baptista Cardia Ferrari, Claudio Marana, Aparecido Nilceu [UNESP] Berretti, Stefano Del Bimbo, Alberto
dc.subject.por.fl_str_mv	3D face recognition convolutional neural networks feature descriptors self- and cross-attention
topic	3D face recognition convolutional neural networks feature descriptors self- and cross-attention
description	In this article, we propose a hybrid framework for cross-resolution 3D face recognition which utilizes a Streamed Attention Network (SAN) that combines handcrafted features with Convolutional Neural Networks (CNNs). It consists of two main stages: first, we process the depth images to extract low-level surface descriptors and derive the corresponding Descriptor Images (DIs), represented as four-channel images. To build the DIs, we propose a variation of the 3D Local Binary Pattern (3DLBP) operator that encodes depth differences using a sigmoid function. Then, we design a CNN that learns from these DIs. The peculiarity of our solution consists in processing each channel of the input image separately, and fusing the contribution of each channel by means of both self- and cross-attention mechanisms. This strategy showed two main advantages over the direct application of Deep-CNN to depth images of the face; on the one hand, the DIs can reduce the diversity between high- and low-resolution data by encoding surface properties that are robust to resolution differences. On the other, it allows a better exploitation of the richer information provided by low-level features, resulting in improved recognition. We evaluated the proposed architecture in a challenging cross-dataset, cross-resolution scenario. To this aim, we first train the network on scanner-resolution 3D data. Next, we utilize the pre-trained network as feature extractor on low-resolution data, where the output of the last fully connected layer is used as face descriptor. Other than standard benchmarks, we also perform experiments on a newly collected dataset of paired high- and low-resolution 3D faces. We use the high-resolution data as gallery, while low-resolution faces are used as probe, allowing us to assess the real gap existing between these two types of data. Extensive experiments on low-resolution 3D face benchmarks show promising results with respect to state-of-the-art methods.
publishDate	2023
dc.date.none.fl_str_mv	2023-07-29T16:05:41Z 2023-07-29T16:05:41Z 2023-01-23
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://dx.doi.org/10.1145/3527158 ACM Transactions on Multimedia Computing, Communications and Applications, v. 19, n. 1, 2023. 1551-6865 1551-6857 http://hdl.handle.net/11449/249658 10.1145/3527158 2-s2.0-85148038627
url	http://dx.doi.org/10.1145/3527158 http://hdl.handle.net/11449/249658
identifier_str_mv	ACM Transactions on Multimedia Computing, Communications and Applications, v. 19, n. 1, 2023. 1551-6865 1551-6857 10.1145/3527158 2-s2.0-85148038627
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	ACM Transactions on Multimedia Computing, Communications and Applications
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.source.none.fl_str_mv	Scopus reponame:Repositório Institucional da UNESP instname:Universidade Estadual Paulista (UNESP) instacron:UNESP
instname_str	Universidade Estadual Paulista (UNESP)
instacron_str	UNESP
institution	UNESP
reponame_str	Repositório Institucional da UNESP
collection	Repositório Institucional da UNESP
repository.name.fl_str_mv	Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)
repository.mail.fl_str_mv
_version_	1799964888767922176

Learning Streamed Attention Network from Descriptor Images for Cross-Resolution 3D Face Recognition

Registros relacionados