High throughput, low latency, memory optimized 64K point FFT architecture using novel radix-4 butterfly unit

2013 
In this paper we propose a fully parallel 64K point radix-4 4 FFT processor. The radix-4 4 parallel unrolled architecture uses a novel radix-4 butterfly unit which takes all four inputs in parallel and can selectively produce one out of the four outputs. The radix-4 4 block can take all 256 inputs in parallel and can use the select control signals to generate one out of the 256 outputs. The resultant 64K point FFT processor shows significant reduction in intermediate memory but with increased hardware complexity. Compared to the state-of-art implementation [5], our architecture shows reduced latency with comparable throughput and area. The 64K point FFT architecture was synthesized using a 130nm CMOS technology which resulted in a throughput of 1.4 GSPS and latency of 47.7μs with a maximum clock frequency of 350MHz. When compared to [5], the latency is reduced by 303μs with 50.8% reduction in area.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    6
    References
    13
    Citations
    NaN
    KQI
    []