Design of a Low Power Bfloat16 Pipelined MAC Unit for Deep Neural Network Applications

2021 
Evolution of artificial intelligence (AI) and advances in semiconductor technology has enabled us to design many complex systems ranging from IoT based applications to high performance compute engines. AI incorporates various application driven machine learning algorithms, in which floating point numbers are employed for the training of neural network models. However, few simpler number systems, such as fixed-point and integers, are employed in inference due to their smaller bit-width, which reduce area and power consumption at the cost of accuracy due to quantization. The usage of floating point MAC improves the accuracy, but it results in a larger area and more power consumption. In this paper, an area and power efficient pipelined Bfloat16 MAC is proposed aiming performance improvement of neural network applications. The proposed unit is able to handle overflow, underflow, and normalization efficiently. Additionally, computational accuracy of MAC is improved by increasing mantissa bit-width and by eliminating normalization in the intermediate stages. The proposed non-pipelined MAC utilizes 18.61% less resources as compared to similar architectures. The area and power of the proposed 16-bit nonpipelined Bfloat16 MAC is reduced by 5.21% and 32%, respectively, at 200 MHz as compared to another 16-bit nonpipelined Bfloat16 MAC reported in [26]. The area and power of our proposed MAC is improved by 38.6% and 93% at 200 MHz, and 7.1% and 11.52% at 01 GHz, when it is compared with a 16-bit pipelined posit MAC and a pipelined Bfloat16 MAC reported in [27], respectively.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    0
    Citations
    NaN
    KQI
    []