Loop Optimizations of MGS-QRD Algorithm for FPGA High-Level Synthesis

2019 
The best-known Modified Gram-Schmidt QR decomposition (MGS-QRD) algorithm contains many dependency problems in the aspects of data, memory, loop and control that hinder the high-level synthesis from optimizing the algorithm. So, we present a well-formed algorithm structure to reduce latency and hardware resources. We also present the second MGS-QRD algorithm to further reduce the DSP usage and support bigger QR decomposition size. The proposed algorithms achieve better overall performance than the best-known MGSQRD algorithm. Mapped to an Intel Arria 10 FPGA device, we achieve 0.53us for an 8x8 real QRD of the first proposed algorithm, and 0.59us for an 8x8 real QRD of the second proposed algorithm in the implemented system latency. Various HLS optimization steps and dependence analysis are also provided to improve the performance, it shows an approximately 44 times increase in QRD throughput.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    12
    References
    1
    Citations
    NaN
    KQI
    []