Hardware pipelining of repetitive patterns in processor instruction traces

Thumbnail Image
Date
2013
Authors
João Bispo
João Paiva Cardoso
Monteiro,J
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Dynamic partitioning is a promising technique where computations are transparently moved from a General Purpose Processor (GPP) to a coprocessor during application execution. To be effective, the mapping of computations to the coprocessor needs to consider aggressive optimizations. One of the mapping optimizations is loop pipelining, a technique extensively studied and known to allow substantial performance improvements. This paper describes a technique for pipelining Megablocks, a type of runtime loop developed for dynamic partitioning. The technique transforms the body of Mega-blocks into an acyclic dataflow graph which can be fully pipe-lined and is based on the atomic execution of loop iterations. For a set of 9 benchmarks without memory operations, we generated pipelined hardware versions of the loops and esti-mate that the presented loop pipelining technique increases the average speedup of non-pipelined coprocessor accelerated designs from 1.6× to 2.2×. For a larger set of 61 benchmarks which include memory operations, we estimate through simulation a speedup increase from 2.5× to 5.6× with this technique.
Description
Keywords
Citation