CIMAT: A Compute-In-Memory Architecture for On-chip Training Based on Transpose SRAM Arrays
40
Citation
32
Reference
10
Related Paper
Citation Trend
Abstract:
Rapid development in deep neural networks (DNNs) is enabling many intelligent applications. However, on-chip training of DNNs is challenging due to the extensive computation and memory bandwidth requirements. To solve the bottleneck of the memory wall problem, compute-in-memory (CIM) approach exploits the analog computation along the bit line of the memory array thus significantly speeds up the vector-matrix multiplications. So far, most of the CIM-based architectures target at implementing inference engine for offline training only. In this article, we propose CIMAT, a CIM Architecture for Training. At the bitcell level, we design two versions of 7T and 8T transpose SRAM to implement bi-directional vector-to-matrix multiplication that is needed for feedforward (FF) and backprogpagation (BP). Moreover, we design the periphery circuitry, mapping strategy and the data flow for the BP process and weight update to support the on-chip training based on CIM. To further improve training performance, we explore the pipeline optimization of proposed architecture. We utilize the mature and advanced CMOS technology at 7 nm to design the CIMAT architecture with 7T/8T transpose SRAM array that supports bi-directional parallel read. We explore the 8-bit training performance of ImageNet on ResNet-18, showing that 7T-based design can achieve 3.38× higher energy efficiency (~6.02 TOPS/W), 4.34× frame rate (~4,020 fps) and only 50 percent chip size compared to the baseline architecture with conventional 6T SRAM array that supports row-by-row read only. The even better performance is obtained with 8T-based architecture, which can reach ~10.79 TOPS/W and ~48,335 fps with 74-percent chip area compared to the baseline.Keywords:
Transpose
Ability to reproduce and transpose matrices was studied in 23 adult SSN trainees. These subjects could transpose the matrices at a MA equivalent to the CA at which children normally do. The ability to reproduce the same matrices was not in evidence prior to the ability to transpose them, as it is in normal children. One possible interpretation of this finding is that cognitive development in SSNs is qualitatively different from that of normal children.
Transpose
Cite
Citations (3)
Synthetic Aperture Radar(SAR) is a high-resolution imaging radar,while the matrix transpose is a critical step in the real-time signal processing of SAR imaging,the efficiency of matrix transpose greatly influence the performance of the signal processing system.Some popular methods can be utilized to achieve the matrix transpose,such as in row out column(IROC)or in column out row(ICOR),two-page or three-page transpose method,etc.However,these methods are not very efficient.In this paper,a new matrix transpose method is proposed based on the existing matrix transpose methods.An actual hardware platform is used for real-time imaging process using the new matrix transpose method described in this paper.The efficiency of matrix transpose is 78% with the entire SAR imaging runtime of 10 seconds.Experimental result demonstrates that the method proposed is efficient for solving the matrix transpose problem.
Transpose
Matrix (chemical analysis)
Cite
Citations (4)
We present a decomposition method for the parallelization of multi-dimensional FFTs with two distinguishing features: adaptive decomposition and transpose order awareness for achieving minimal communication volume. Based on a row-wise decomposition that translates the multi-dimensional data into one-dimensional data for equally allocating to the processes, our method can adaptively decompose the data in the lowest possible dimensions to reduce communication volume in the first place, differently from previous works that have pre-defined dimensions of decomposition. Also, this decomposition offers plenty of orders in data transpose, and different transpose orders result in different volumes of communication. By analyzing all the possible cases, we find out the best transpose orders with minimal communication volumes for 3-D, 4-D, and 5-D FFTs.
Transpose
Cite
Citations (3)
According to different matrix storage methods,we conduct a comparative analysis on different matrix transpose methods,put forward an improved transpose method for the classic transpose algorithm of general matrixes,and give several matrix transpose algorithms written in C language.By analyzing the time and space complexity of these algorithms,their advantages and disadvantages are summarized.
Transpose
Matrix (chemical analysis)
Cite
Citations (0)
Transpose
Matrix (chemical analysis)
Cite
Citations (0)
It is well known that the positive partial transpose (PPT) criterion for determining separability is an operational separability criterion. However, in a high-dimensional (>6) situation, this criterion is not sufficient. How can we judge whether an entangled state in any high-dimensional quantum system is PPT or not? Here we propose a linear algebra method for checking the positivity of the partial transpose of a state and present a set of entangled mixed states with non-positive partial transpose.
Transpose
Peres–Horodecki criterion
Cite
Citations (2)
PROC TRANSPOSE continues to confuse programmers. The ability to effectively transpose a data set is very important when working with different data structures and different data standards. This paper will provide a non technical approach to understanding the transpose procedure by showing the programmer how to visualize the expected output. PROC TRANSPOSE will be deconstructed into three simple movements: what goes up, what goes down, and what goes into the middle. Programmers who have a hard time fully grasping PROC TRANSPOSE will benefit from this paper.
Transpose
Programmer
Cite
Citations (0)
<abstract><p>In this article, we study the class of positive partial transpose blocks. We introduce several inequalities related to this class with an emphasis on comparing the main diagonal and off-diagonal components of a $ 2 \times 2 $ positive partial transpose block.</p></abstract>
Transpose
Cite
Citations (2)
In this paper, an Adaptive Modified Transpose Jacobian approach is presented which employs an adaptive algorithm in order to tune control gains. In order to overcome the disadvantages of large gains as well as the need for trial and error to find gains in Transpose Jacobian algorithm, a modified version of the foregoing algorithm was proposed in the literature which is called Modified Transpose Jacobin. In this paper, an adaptive method is utilized for driving the gains of Modified Transpose Jacobian algorithm. Simulation results are presented in MATLAB and ADAMS software on a two fingered robotic hand tracking a circular path with straight corners to end of comparing tracking performance of the proposed algorithm with Transpose Jacobian, Modified Transpose Jacobian and Adaptive Transpose Jacobian control algorithms. The obtained results reveal that the proposed Adaptive Modified Transpose Jacobian control perform better compared to the other three methods in high speed motion. In the case of low-speed motion, the Adaptive Transpose Jacobian and Modified Transpose Jacobian algorithms results in best performance over the other non-adaptive algorithms.
Transpose
Cite
Citations (0)
This paper presents a Tensor Transposition Library for GPUs (TTLG). A distinguishing feature of TTLG is that it also includes a performance prediction model, which can be used by higher level optimizers that use tensor transposition. For example, tensor contractions are often implemented by using the TTGT (Transpose-Transpose-GEMM-Transpose) approach - transpose input tensors to a suitable layout and then use high-performance matrix multiplication followed by transposition of the result. The performance model is also used internally by TTLG for choosing among alternative kernels and/or slicing/blocking parameters for the transposition. TTLG is compared with current state-of-the-art alternatives for GPUs. Comparable or better transposition times for the "repeated-use" scenario and considerably better "single-use" performance are observed.
Transpose
Transposition (logic)
Cite
Citations (6)