A Near-Memory Processor for Vector, Streaming and Bit Manipulation Workloads

2005 
Many important scientific and engineering applications execute sub-optimally on current commodity processors and servers. Specifically, since they frequently use caches ineffectively, they are often heavily bottlenecked by global memory bandwidth. In addition, they sometimes need to perform expensive bit manipulation operations that are not efficiently supported by commodity ISAs. Moreover, an analysis of technology trends suggests that, despite the criticality of some of these applications, future commodity processors and servers are unlikely to be tuned for them. To address this problem, this paper proposes the design of a simple co-processor on which the main processor can off-load vector, streaming, and bit-manipulation computation. The coprocessor is a blocked-multithreaded narrow in-order core with support for vectors, streams, and bit manipulation. It has no caches and a high bandwidth to memory. For this reason, rather than for its actual physical location, we call it Near-Memory Processor (NMP). Our simulations show that a set of scientific applications run much faster on the NMP than on an aggressive conventional processor. Specifically, the speedups obtained reach 18, with a geometric mean of 5.8 for 10 applications. � This work is supported by DARPA Contract NBCHC-02-0056 and NBCH30390004.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    28
    References
    18
    Citations
    NaN
    KQI
    []