Generation of Customized Accelerators for Loop Pipelining of Binary Instruction Traces

Nuno Miguel Paulino; João Canas Ferreira; João Paiva Cardoso

Generation of Customized Accelerators for Loop Pipelining of Binary Instruction Traces

Detalhes bibliográficos
Autor(a) principal:	Nuno Miguel Paulino
Data de Publicação:	2017
Outros Autores:	João Canas Ferreira, João Paiva Cardoso
Tipo de documento:	Artigo
Idioma:	eng
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	http://repositorio.inesctec.pt/handle/123456789/5539 http://dx.doi.org/10.1109/tvlsi.2016.2573640
Resumo:	Many embedded applications process large amounts of data using regular computational kernels, amenable to acceleration by specialized hardware coprocessors. To reduce the significant design effort, the dedicated hardware may be automatically generated, usually starting from the application's source or binary code. This paper presents a moduloscheduled loop accelerator capable of executing multiple loops and a supporting toolchain. A generation/scheduling procedure, which fully relies on MicroBlaze instruction traces, produces accelerator instances, customized in terms of functional units and interconnections. The accelerators support integer and single-precision floating-point arithmetic, and exploit instruction-level parallelism, loop pipelining, and memory access parallelism via two read/write ports. A complete implementation of the proposed architecture is evaluated in a Virtex-7 device. Augmenting a MicroBlaze processor with a tailored accelerator achieves a geometric mean speedup, over software-only execution, of 6.61x for 13 floating-point kernels from the Livermore Loops set, and of 4.08x for 11 integer kernels from Texas Instruments' IMGLIB. The proposed customized accelerators are compared with ALU-based ones. The average specialized accelerator requires only 0.47x the number of field-programmable gate array slices of an accelerator with four ALUs. A geometric mean speedup of 1.78x over a four-issue very long instruction word (without floating-point support) was obtained for the integer kernels.

Metadados do item

id	RCAP_a5999655a9d5e28d1535d8223512bd20
oai_identifier_str	oai:repositorio.inesctec.pt:123456789/5539
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	Generation of Customized Accelerators for Loop Pipelining of Binary Instruction TracesMany embedded applications process large amounts of data using regular computational kernels, amenable to acceleration by specialized hardware coprocessors. To reduce the significant design effort, the dedicated hardware may be automatically generated, usually starting from the application's source or binary code. This paper presents a moduloscheduled loop accelerator capable of executing multiple loops and a supporting toolchain. A generation/scheduling procedure, which fully relies on MicroBlaze instruction traces, produces accelerator instances, customized in terms of functional units and interconnections. The accelerators support integer and single-precision floating-point arithmetic, and exploit instruction-level parallelism, loop pipelining, and memory access parallelism via two read/write ports. A complete implementation of the proposed architecture is evaluated in a Virtex-7 device. Augmenting a MicroBlaze processor with a tailored accelerator achieves a geometric mean speedup, over software-only execution, of 6.61x for 13 floating-point kernels from the Livermore Loops set, and of 4.08x for 11 integer kernels from Texas Instruments' IMGLIB. The proposed customized accelerators are compared with ALU-based ones. The average specialized accelerator requires only 0.47x the number of field-programmable gate array slices of an accelerator with four ALUs. A geometric mean speedup of 1.78x over a four-issue very long instruction word (without floating-point support) was obtained for the integer kernels.2018-01-05T16:05:14Z2017-01-01T00:00:00Z2017info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://repositorio.inesctec.pt/handle/123456789/5539http://dx.doi.org/10.1109/tvlsi.2016.2573640engNuno Miguel PaulinoJoão Canas FerreiraJoão Paiva Cardosoinfo:eu-repo/semantics/embargoedAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-05-15T10:19:56Zoai:repositorio.inesctec.pt:123456789/5539Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T17:52:28.113593Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	Generation of Customized Accelerators for Loop Pipelining of Binary Instruction Traces
title	Generation of Customized Accelerators for Loop Pipelining of Binary Instruction Traces
spellingShingle	Generation of Customized Accelerators for Loop Pipelining of Binary Instruction Traces Nuno Miguel Paulino
title_short	Generation of Customized Accelerators for Loop Pipelining of Binary Instruction Traces
title_full	Generation of Customized Accelerators for Loop Pipelining of Binary Instruction Traces
title_fullStr	Generation of Customized Accelerators for Loop Pipelining of Binary Instruction Traces
title_full_unstemmed	Generation of Customized Accelerators for Loop Pipelining of Binary Instruction Traces
title_sort	Generation of Customized Accelerators for Loop Pipelining of Binary Instruction Traces
author	Nuno Miguel Paulino
author_facet	Nuno Miguel Paulino João Canas Ferreira João Paiva Cardoso
author_role	author
author2	João Canas Ferreira João Paiva Cardoso
author2_role	author author
dc.contributor.author.fl_str_mv	Nuno Miguel Paulino João Canas Ferreira João Paiva Cardoso
description	Many embedded applications process large amounts of data using regular computational kernels, amenable to acceleration by specialized hardware coprocessors. To reduce the significant design effort, the dedicated hardware may be automatically generated, usually starting from the application's source or binary code. This paper presents a moduloscheduled loop accelerator capable of executing multiple loops and a supporting toolchain. A generation/scheduling procedure, which fully relies on MicroBlaze instruction traces, produces accelerator instances, customized in terms of functional units and interconnections. The accelerators support integer and single-precision floating-point arithmetic, and exploit instruction-level parallelism, loop pipelining, and memory access parallelism via two read/write ports. A complete implementation of the proposed architecture is evaluated in a Virtex-7 device. Augmenting a MicroBlaze processor with a tailored accelerator achieves a geometric mean speedup, over software-only execution, of 6.61x for 13 floating-point kernels from the Livermore Loops set, and of 4.08x for 11 integer kernels from Texas Instruments' IMGLIB. The proposed customized accelerators are compared with ALU-based ones. The average specialized accelerator requires only 0.47x the number of field-programmable gate array slices of an accelerator with four ALUs. A geometric mean speedup of 1.78x over a four-issue very long instruction word (without floating-point support) was obtained for the integer kernels.
publishDate	2017
dc.date.none.fl_str_mv	2017-01-01T00:00:00Z 2017 2018-01-05T16:05:14Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://repositorio.inesctec.pt/handle/123456789/5539 http://dx.doi.org/10.1109/tvlsi.2016.2573640
url	http://repositorio.inesctec.pt/handle/123456789/5539 http://dx.doi.org/10.1109/tvlsi.2016.2573640
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/embargoedAccess
eu_rights_str_mv	embargoedAccess
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799131600662822913

Generation of Customized Accelerators for Loop Pipelining of Binary Instruction Traces

Registros relacionados