CIMAT: A Compute-In-Memory Architecture for On-chip Training Based on Transpose SRAM Arrays

IEEE Transactions on Computers (2020)

Hongwu Jiang Xiaochen Peng Shanshi Huang Shimeng Yu

Citation

Reference

Related Paper

Citation Trend

Abstract:

Rapid development in deep neural networks (DNNs) is enabling many intelligent applications. However, on-chip training of DNNs is challenging due to the extensive computation and memory bandwidth requirements. To solve the bottleneck of the memory wall problem, compute-in-memory (CIM) approach exploits the analog computation along the bit line of the memory array thus significantly speeds up the vector-matrix multiplications. So far, most of the CIM-based architectures target at implementing inference engine for offline training only. In this article, we propose CIMAT, a CIM Architecture for Training. At the bitcell level, we design two versions of 7T and 8T transpose SRAM to implement bi-directional vector-to-matrix multiplication that is needed for feedforward (FF) and backprogpagation (BP). Moreover, we design the periphery circuitry, mapping strategy and the data flow for the BP process and weight update to support the on-chip training based on CIM. To further improve training performance, we explore the pipeline optimization of proposed architecture. We utilize the mature and advanced CMOS technology at 7 nm to design the CIMAT architecture with 7T/8T transpose SRAM array that supports bi-directional parallel read. We explore the 8-bit training performance of ImageNet on ResNet-18, showing that 7T-based design can achieve 3.38× higher energy efficiency (~6.02 TOPS/W), 4.34× frame rate (~4,020 fps) and only 50 percent chip size compared to the baseline architecture with conventional 6T SRAM array that supports row-by-row read only. The even better performance is obtained with 8T-based architecture, which can reach ~10.79 TOPS/W and ~48,335 fps with 74-percent chip area compared to the baseline.

Keywords:

Transpose

Topics:

Advanced Memory and Neural Computing

Advanced Neural Network Applications

Ferroelectric and Negative Capacitance Devices

10.1109/tc.2020.2980533

Cite

IKONIC IMAGERY IN THE SEVERELY SUBNORMAL

British Journal of Psychology (1972)

C. K. Mackay

Ability to reproduce and transpose matrices was studied in 23 adult SSN trainees. These subjects could transpose the matrices at a MA equivalent to the CA at which children normally do. The ability to reproduce the same matrices was not in evidence prior to the ability to transpose them, as it is in normal children. One possible interpretation of this finding is that cognitive development in SSNs is qualitatively different from that of normal children.

Transpose

10.1111/j.2044-8295.1972.tb01313.x

Cite

Citations (3)

Research and implementation of matrix transpose for real-time SAR imaging system

Computer Engineering and Applications Journal (2011)

Wang Jing-hua

Synthetic Aperture Radar(SAR) is a high-resolution imaging radar,while the matrix transpose is a critical step in the real-time signal processing of SAR imaging,the efficiency of matrix transpose greatly influence the performance of the signal processing system.Some popular methods can be utilized to achieve the matrix transpose,such as in row out column(IROC)or in column out row(ICOR),two-page or three-page transpose method,etc.However,these methods are not very efficient.In this paper,a new matrix transpose method is proposed based on the existing matrix transpose methods.An actual hardware platform is used for real-time imaging process using the new matrix transpose method described in this paper.The efficiency of matrix transpose is 78% with the entire SAR imaging runtime of 10 seconds.Experimental result demonstrates that the method proposed is efficient for solving the matrix transpose problem.

Transpose

Matrix (chemical analysis)

Source

Cite

Citations (4)

A decomposition method with minimal communication volume for parallelization of multi-dimensional FFTs

Truong Vinh Truong Duy Taisuke Ozaki

We present a decomposition method for the parallelization of multi-dimensional FFTs with two distinguishing features: adaptive decomposition and transpose order awareness for achieving minimal communication volume. Based on a row-wise decomposition that translates the multi-dimensional data into one-dimensional data for equally allocating to the processes, our method can adaptively decompose the data in the lowest possible dimensions to reduce communication volume in the first place, differently from previous works that have pre-defined dimensions of decomposition. Also, this decomposition offers plenty of orders in data transpose, and different transpose orders result in different volumes of communication. By analyzing all the possible cases, we find out the best transpose orders with minimal communication volumes for 3-D, 4-D, and 5-D FFTs.

Transpose

10.1145/2464996.2467276

Cite

Citations (3)

Study on Matrix Transpose Algorithms

Computer Era (2011)

Donghong Shan

According to different matrix storage methods,we conduct a comparative analysis on different matrix transpose methods,put forward an improved transpose method for the classic transpose algorithm of general matrixes,and give several matrix transpose algorithms written in C language.By analyzing the time and space complexity of these algorithms,their advantages and disadvantages are summarized.

Transpose

Matrix (chemical analysis)

Source

Cite

Citations (0)

Matrix transpose on meshes with buses

Journal of Parallel and Distributed Computing (2016)

József Békési Gábor Galambos

Transpose

Matrix (chemical analysis)

10.1016/j.jpdc.2016.05.015

Cite

Citations (0)

Entangled States with Positive Partial Transpose in Any-Dimensional Quantum System

Chinese Physics Letters (2005)

Yu-Chun Wu Ping‐Xing Chen Zheng-Wei Zhou Guo Guang-Can

It is well known that the positive partial transpose (PPT) criterion for determining separability is an operational separability criterion. However, in a high-dimensional (>6) situation, this criterion is not sufficient. How can we judge whether an entangled state in any high-dimensional quantum system is PPT or not? Here we propose a linear algebra method for checking the positivity of the partial transpose of a state and present a set of entangled mixed states with non-positive partial transpose.

Transpose

Peres–Horodecki criterion

10.1088/0256-307x/22/3/003

Cite

Citations (2)

Visualizing PROC TRANSPOSE

Daniel Boisvert Cambridge Ma Shafi Chowdhury Shafi Consultancy

PROC TRANSPOSE continues to confuse programmers. The ability to effectively transpose a data set is very important when working with different data structures and different data standards. This paper will provide a non technical approach to understanding the transpose procedure by showing the programmer how to visualize the expected output. PROC TRANSPOSE will be deconstructed into three simple movements: what goes up, what goes down, and what goes into the middle. Programmers who have a hard time fully grasping PROC TRANSPOSE will benefit from this paper.

Transpose

Programmer

Source

Cite

Citations (0)

A note on positive partial transpose blocks

AIMS Mathematics (2023)

Moh. Alakhrass

<abstract><p>In this article, we study the class of positive partial transpose blocks. We introduce several inequalities related to this class with an emphasis on comparing the main diagonal and off-diagonal components of a $ 2 \times 2 $ positive partial transpose block.</p></abstract>

Transpose

10.3934/math.20231208

Cite

Citations (2)

Adaptive Modified Transpose Jacobian Control of a Two-fingered Robotic Hand

2022 10th RSI International Conference on Robotics and Mechatronics (ICRoM) (2021)

Mohammad-Javad Davari S. Ali A. Moosavian Ali Ghaffari Mehdi Tale Masouleh

In this paper, an Adaptive Modified Transpose Jacobian approach is presented which employs an adaptive algorithm in order to tune control gains. In order to overcome the disadvantages of large gains as well as the need for trial and error to find gains in Transpose Jacobian algorithm, a modified version of the foregoing algorithm was proposed in the literature which is called Modified Transpose Jacobin. In this paper, an adaptive method is utilized for driving the gains of Modified Transpose Jacobian algorithm. Simulation results are presented in MATLAB and ADAMS software on a two fingered robotic hand tracking a circular path with straight corners to end of comparing tracking performance of the proposed algorithm with Transpose Jacobian, Modified Transpose Jacobian and Adaptive Transpose Jacobian control algorithms. The obtained results reveal that the proposed Adaptive Modified Transpose Jacobian control perform better compared to the other three methods in high speed motion. In the case of low-speed motion, the Adaptive Transpose Jacobian and Modified Transpose Jacobian algorithms results in best performance over the other non-adaptive algorithms.

Transpose

10.1109/icrom54204.2021.9663452

Cite

Citations (0)

TTLG - An Efficient Tensor Transposition Library for GPUs

2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (2018)

Jyothi Vedurada Arjun Suresh Aravind Sukumaran-Rajam Jinsung Kim Changwan Hong

This paper presents a Tensor Transposition Library for GPUs (TTLG). A distinguishing feature of TTLG is that it also includes a performance prediction model, which can be used by higher level optimizers that use tensor transposition. For example, tensor contractions are often implemented by using the TTGT (Transpose-Transpose-GEMM-Transpose) approach - transpose input tensors to a suitable layout and then use high-performance matrix multiplication followed by transposition of the result. The performance model is also used internally by TTLG for choosing among alternative kernels and/or slicing/blocking parameters for the transposition. TTLG is compared with current state-of-the-art alternatives for GPUs. Comparable or better transposition times for the "repeated-use" scenario and considerably better "single-use" performance are observed.

Transpose

Transposition (logic)

10.1109/ipdps.2018.00067

Cite

Citations (6)