An Optimal Design Method of Conv2d Operator for TensorFlow Based on FPGA Accelerator.

2020 
Currently, TensorFlow architecture only supports CPU and GPU programming, and has not yet formed a unified support standard for FPGAs. To the best of our knowledge, when forward operators in TensorFlow specifies a new device, the backward gradient operator in the same neural network cannot use the same device, which does not comply with rules about node device allocation in TensorFlow. Therefore, we propose an improved algorithm for node device allocation based on placement mechanism and an optimization algorithm for conv2d operator based on OpenCL. The proposed improved algorithm for node device allocation makes forward and backward operators based on FPGA accelerator satisfy the node and device allocation requirements for all TensorFlow operators, and the conv2d operator optimization algorithm based on OpenCL takes full advantage of the parallel computing advantages of FPGA. Finally, this paper uses the CNN LeNet5 model and the MNIST dataset to conduct corresponding experiments. Referring to conv2d operator, based on FPGA accelerator, we implement both the forward and backward operators involved in the first four layers of the model. The experimental results show that the accuracy of the three methods is above 98%. Compared with CPU and GPU, the accuracy difference is only about five thousandths. In addition, in the case of different batch sizes, we tested the runtime of conv2d operator in the first layer of this model. The results show that when the input batch size increased to 10000, the FPGA runs 9 times faster than the CPU. It proved that we proposed an optimization solution for TensorFlow to use FPGA operators for neural network calculations.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    13
    References
    0
    Citations
    NaN
    KQI
    []