Generation of Customized Accelerators for Loop Pipelining of Binary Instruction Traces

Nuno Miguel Paulino; João Canas Ferreira; João Paiva Cardoso

Generation of Customized Accelerators for Loop Pipelining of Binary Instruction Traces

dc.contributor.author	Nuno Miguel Paulino	en
dc.contributor.author	João Canas Ferreira	en
dc.contributor.author	João Paiva Cardoso	en
dc.contributor.other	5550	en
dc.contributor.other	5802	en
dc.contributor.other	473	en
dc.date.accessioned	2023-05-05T09:32:19Z
dc.date.available	2023-05-05T09:32:19Z
dc.date.issued	2017	en
dc.description.abstract	Many embedded applications process large amounts of data using regular computational kernels, amenable to acceleration by specialized hardware coprocessors. To reduce the significant design effort, the dedicated hardware may be automatically generated, usually starting from the application's source or binary code. This paper presents a moduloscheduled loop accelerator capable of executing multiple loops and a supporting toolchain. A generation/scheduling procedure, which fully relies on MicroBlaze instruction traces, produces accelerator instances, customized in terms of functional units and interconnections. The accelerators support integer and single-precision floating-point arithmetic, and exploit instruction-level parallelism, loop pipelining, and memory access parallelism via two read/write ports. A complete implementation of the proposed architecture is evaluated in a Virtex-7 device. Augmenting a MicroBlaze processor with a tailored accelerator achieves a geometric mean speedup, over software-only execution, of 6.61x for 13 floating-point kernels from the Livermore Loops set, and of 4.08x for 11 integer kernels from Texas Instruments' IMGLIB. The proposed customized accelerators are compared with ALU-based ones. The average specialized accelerator requires only 0.47x the number of field-programmable gate array slices of an accelerator with four ALUs. A geometric mean speedup of 1.78x over a four-issue very long instruction word (without floating-point support) was obtained for the integer kernels.	en
dc.identifier	P-00M-AM7	en
dc.identifier.uri	http://dx.doi.org/10.1109/tvlsi.2016.2573640	en
dc.identifier.uri	https://repositorio.inesctec.pt/handle/123456789/13816
dc.language	eng	en
dc.rights	info:eu-repo/semantics/openAccess	en
dc.title	Generation of Customized Accelerators for Loop Pipelining of Binary Instruction Traces	en
dc.type		en
dc.type	Publication	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: P-00M-AM7.pdf
Size:: 2.71 MB
Format:: Adobe Portable Document Format
Description:

Download

Collections

HumanISE - Indexed Articles in Journals