A configurable architecture for running hybrid convolutional neural networks in low-density FPGAs
Autor(a) principal: | |
---|---|
Data de Publicação: | 2020 |
Outros Autores: | , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10400.21/12726 |
Resumo: | Convolutional neural networks have become the state of the art of machine learning for a vast set of applications, especially for image classification and object detection. There are several advantages to running inference on these models at the edge, including real-time performance and data privacy. The high computing and memory requirements of convolutional neural networks have been major obstacles to the broader deployment of CNNs on edge devices. Data quantization is an optimization method that reduces the number of bits used to represent weights and activations of a network model, minimizing storage requirements and computing complexity. Quantization can be applied at the layer level, by using different bit widths in different layers: this is called hybrid quantization. This article proposes a new efficient and configurable architecture for running CNNs with hybrid quantization in low-density Field-Programmable Gate Arrays (FPGAs) targeting edge devices. The architecture has been implemented on the Xilinx ZYNQ7020/45 devices and is running the AlexNet and VGG16 networks. Running AlexNet, the architecture has a throughput up to 508 images per second on the ZYNQ7020 device, and 1639 images per second on the ZYNQ7045 device. Considering VGG16, the architecture delivers up to 43 images per second on the ZYNQ7020 device, and 81 images per second on the ZYNQ7045 device. The proposed hybrid architecture achieves up to 13.7 x improvement in performance compared to state-of-the-art solutions, with small accuracy degradation. |
id |
RCAP_73f711169aca9c32a09c04786e1519a6 |
---|---|
oai_identifier_str |
oai:repositorio.ipl.pt:10400.21/12726 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
A configurable architecture for running hybrid convolutional neural networks in low-density FPGAsConvolutional neural networkDeep learningEmbedded computingField-programmable gate arrayHybrid quantizationConvolutional neural networks have become the state of the art of machine learning for a vast set of applications, especially for image classification and object detection. There are several advantages to running inference on these models at the edge, including real-time performance and data privacy. The high computing and memory requirements of convolutional neural networks have been major obstacles to the broader deployment of CNNs on edge devices. Data quantization is an optimization method that reduces the number of bits used to represent weights and activations of a network model, minimizing storage requirements and computing complexity. Quantization can be applied at the layer level, by using different bit widths in different layers: this is called hybrid quantization. This article proposes a new efficient and configurable architecture for running CNNs with hybrid quantization in low-density Field-Programmable Gate Arrays (FPGAs) targeting edge devices. The architecture has been implemented on the Xilinx ZYNQ7020/45 devices and is running the AlexNet and VGG16 networks. Running AlexNet, the architecture has a throughput up to 508 images per second on the ZYNQ7020 device, and 1639 images per second on the ZYNQ7045 device. Considering VGG16, the architecture delivers up to 43 images per second on the ZYNQ7020 device, and 81 images per second on the ZYNQ7045 device. The proposed hybrid architecture achieves up to 13.7 x improvement in performance compared to state-of-the-art solutions, with small accuracy degradation.IEEERCIPLVéstias, MárioDuarte, RuiDe Sousa, JoseCláudio de Campos Neto, Horácio2021-01-28T16:48:41Z2020-06-082020-06-08T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10400.21/12726engVÉSTIAS, Mário P.; [et al] – A configurable architecture for running hybrid convolutional neural networks in low-density FPGAs. IEEE Access. ISSN 2169-3536. Vol. 8 (2020), pp. 107229-1072432169-353610.1109/ACCESS.2020.3000444info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-08-03T10:06:12Zoai:repositorio.ipl.pt:10400.21/12726Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T20:20:45.710824Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
A configurable architecture for running hybrid convolutional neural networks in low-density FPGAs |
title |
A configurable architecture for running hybrid convolutional neural networks in low-density FPGAs |
spellingShingle |
A configurable architecture for running hybrid convolutional neural networks in low-density FPGAs Véstias, Mário Convolutional neural network Deep learning Embedded computing Field-programmable gate array Hybrid quantization |
title_short |
A configurable architecture for running hybrid convolutional neural networks in low-density FPGAs |
title_full |
A configurable architecture for running hybrid convolutional neural networks in low-density FPGAs |
title_fullStr |
A configurable architecture for running hybrid convolutional neural networks in low-density FPGAs |
title_full_unstemmed |
A configurable architecture for running hybrid convolutional neural networks in low-density FPGAs |
title_sort |
A configurable architecture for running hybrid convolutional neural networks in low-density FPGAs |
author |
Véstias, Mário |
author_facet |
Véstias, Mário Duarte, Rui De Sousa, Jose Cláudio de Campos Neto, Horácio |
author_role |
author |
author2 |
Duarte, Rui De Sousa, Jose Cláudio de Campos Neto, Horácio |
author2_role |
author author author |
dc.contributor.none.fl_str_mv |
RCIPL |
dc.contributor.author.fl_str_mv |
Véstias, Mário Duarte, Rui De Sousa, Jose Cláudio de Campos Neto, Horácio |
dc.subject.por.fl_str_mv |
Convolutional neural network Deep learning Embedded computing Field-programmable gate array Hybrid quantization |
topic |
Convolutional neural network Deep learning Embedded computing Field-programmable gate array Hybrid quantization |
description |
Convolutional neural networks have become the state of the art of machine learning for a vast set of applications, especially for image classification and object detection. There are several advantages to running inference on these models at the edge, including real-time performance and data privacy. The high computing and memory requirements of convolutional neural networks have been major obstacles to the broader deployment of CNNs on edge devices. Data quantization is an optimization method that reduces the number of bits used to represent weights and activations of a network model, minimizing storage requirements and computing complexity. Quantization can be applied at the layer level, by using different bit widths in different layers: this is called hybrid quantization. This article proposes a new efficient and configurable architecture for running CNNs with hybrid quantization in low-density Field-Programmable Gate Arrays (FPGAs) targeting edge devices. The architecture has been implemented on the Xilinx ZYNQ7020/45 devices and is running the AlexNet and VGG16 networks. Running AlexNet, the architecture has a throughput up to 508 images per second on the ZYNQ7020 device, and 1639 images per second on the ZYNQ7045 device. Considering VGG16, the architecture delivers up to 43 images per second on the ZYNQ7020 device, and 81 images per second on the ZYNQ7045 device. The proposed hybrid architecture achieves up to 13.7 x improvement in performance compared to state-of-the-art solutions, with small accuracy degradation. |
publishDate |
2020 |
dc.date.none.fl_str_mv |
2020-06-08 2020-06-08T00:00:00Z 2021-01-28T16:48:41Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10400.21/12726 |
url |
http://hdl.handle.net/10400.21/12726 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
VÉSTIAS, Mário P.; [et al] – A configurable architecture for running hybrid convolutional neural networks in low-density FPGAs. IEEE Access. ISSN 2169-3536. Vol. 8 (2020), pp. 107229-107243 2169-3536 10.1109/ACCESS.2020.3000444 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
IEEE |
publisher.none.fl_str_mv |
IEEE |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799133477459722240 |