Factored radix-8 systolic array for tensor processing

2020 
Systolic arrays are re-gaining the attention as the heart to accelerate machine learning workloads. This paper shows that a large design space exists at the logic level despite the simple structure of systolic arrays and proposes a novel systolic array based on factoring and radix-8 multipliers. The factored systolic array (FSA) extracts out the booth encoding and the hard-multiple generation which is common across all processing elements, reducing the delay and the area of the whole systolic array. This factoring is done at the cost of an increased number of registers, however, the reduced pipeline register requirement in radix-8 offsets this effect. The proposed factored 16--bit multiplier achieves up to 15%, 13%, and 23% better delay, area, and power, respectively, compared with the radix-4 multipliers even if the register overhead is included. The proposed FSA architecture improves delay, area, and power up to 11%, 20% and 31%, respectively, for different bitwidths when compared with the conventional radix-4 systolic array.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    0
    Citations
    NaN
    KQI
    []