Arbitrary streaming permutations with minimum memory and latency

2016 
Streaming architectures are a popular choice for data intensive application due to their high throughput requirements. When assembling components for a streaming application, it is often necessary to build translation blocks between them to match the ordering of the data elements required for the subsequent processing. This paper addresses this need by developing a technique that realizes arbitrary permutations in a streaming architecture. It is parametrized to accommodate any size data sequence and streaming width. This technique is applied to an architecture that receives continuous input at a rate of k elements per clock cycle, and after an initial start-up latency, outputs continuously at the same rate. In addition, the memory usage and latency through the memory array is minimized. This design is evaluated for permutations parametrized by size and stream width in terms of the memory elements and depths required. The class of stride permutation is considered for specific experimental evaluation. On average, this technique and architecture has only half the latency and requires half the memory of other techniques.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    9
    Citations
    NaN
    KQI
    []