VLSI architectures for real-time signal processing

1991 
We address the problems of developing efficient special-purpose VLSI architectures for computing some important real-time signal processing tasks, namely, one-dimensional Discrete Hartley (DHT) and Discrete Cosine transforms (DCT), multidimensional transforms, template matching and block matching. An important requirement of all these architectures is that they should process huge amounts of data at very high throughput rates. The first problem that we address involves developing systolic array architectures for computing one-dimensional DHT and DCT over N points, when N is factorizable into mutually prime factors $N\sb1$ and $N\sb2$. We map the one-dimensional transform into a two-dimensional transform over ($N\sb1\ \times\ N\sb2$) points such that the algorithm consists of computing one-dimensional transform over columns and rows of the two-dimensional data array. The hardware requirement is considerably reduced because of this mapping. The architecture consists of simple and regular units which are completely pipelined. Next we look at the more general problem of computing any ($N \times N \times ...\times N$) d-dimensional linear separable transform (DXT). Here we develop a family of optimal architectures with area-time trade-offs. The architecture consists of one-dimensional DXT(N) transform computation units which compute DXT(N) over one index, and permutation units which order data so that in the next iteration DXT(N) can be computed over the next index. The architecture has an area A = $O(N\sp{d+2a}$) and computation time T = $O(dN\sp{{d\over2}-a}\ b)$ for all a in the range ${1\over2}$log$\sb{N}$ $b \leq a \leq {d\over2}$, where $b$ = $O$(log$M$) is the precision. The third problem that we address is developing efficient architectures for computing very high input/output (I/O) bandwidth operations, like template matching and block matching. Here we develop a linear semi-systolic array architecture which balances computations in the processor array with the I/O requirements. The I/O bandwidth is reduced by storing part of the input image on-chip in shift registers in each processor, and by circulating the shift registers. The architecture achieves optimal speed-up.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []