This paper presents a novel design-for-test (DFT) technique that allows core vendors to reduce the test complexity of a core they are trying to market. The idea is to design a core so that it can be tested with a very small number of test vectors. The I/O pins of such a "designed for high test compression" (DFHTC) core are identical to the I/O pins of an ordinary core. For the system integrator, testing a DFHTC core is identical to testing an ordinary core. The only difference is that the DFHTC core has a significantly smaller number of test vectors resulting in less test data as well as less test time (fewer scan vectors). This is achieved by carefully combining a parallel "test per clock" BIST scheme inside the core with the normal external testing scheme using a tester. The BIST structure inside the core generates weighted pseudo-random test vectors which detect a large number of faults in the core. Results indicate that such DFHTC cores have a significantly smaller number of test vectors than their ordinary counterparts thereby greatly reducing test time and test storage.
Emerging non-volatile main memories (e.g. phase change memories) have been the continuous focus of research currently. These memories provide an attractive alternative to DRAM with their high density and low cost. But the dominant error models in these memories are of limited magnitude caused by resistance drifts. Hamming codes have been used extensively to protect DRAM due to their low decoding latency and low redundancy as well. But with limited magnitude errors, traditional Hamming codes prove to be inefficient. This paper proposes a new systematic limited magnitude error correcting non-binary Hamming code specifically to address limited magnitude errors in multilevel cell memories storing multiple bits per cell. A general construction methodology is presented to correct errors of limited magnitude and is compared to existing schemes addressing limited magnitude errors in phase change memories. A syndrome analysis is done to show the reduction in total number of syndromes for limited magnitude error models. It is shown that the proposed codes provide better latency and complexity compared to existing limited magnitude error correcting non-binary Hamming codes. It is also shown that the proposed codes achieve better redundancy compared to the symbol extended version of binary Hamming codes.
This paper presents an efficient and scalable technique for lowering power consumption in checkers used for concurrent error detection. The basic idea is to exploit the functional symmetry of concurrent checkers with respect to their inputs, and to order the inputs such that switching activity (and hence power consumption) in the checker is minimized. The inputs of the checker are usually driven by the outputs of the function logic and check symbol generator logic-spatial correlations between these outputs are analyzed to compute an input order that minimizes power consumption. The reduction in power consumption comes at no additional impact to area or performance and does not require any alteration to the design flow. It is shown that the number of possible input orders increases exponentially in the number of inputs to the checker. As a result, the computational cost of determining the optimum input order can be very expensive as the number of inputs to the checker increases. This paper presents a very effective technique to build a reduced cost function to solve the optimization problem to find a near optimal input order. It scales well with increasing number of inputs to the checker, and the computational costs are independent of the complexity of the checker. Experimental results demonstrate that a reduction in power consumption of 16% on the average for several types of checkers can be obtained using the proposed technique.
The first part of this dissertation investigates techniques for delay fault diagnosis. With the advent of deep submicron technology and more aggressive clocking strategies, delay faults are becoming more prevalent. Diagnosing delay faults is essential for improving the yield and quality of integrated circuits. Given that a circuit has failed to meet its timing specifications, this work proposes new techniques to efficiently diagnose the cause of the faulty behavior. The techniques proposed in this thesis aim to reduce the search space for direct probing techniques like E-beam probing. Also a ranking strategy is proposed to guide the direct probing techniques. Procedures are described for adaptively generating additional test vectors to improve the diagnostic resolution for delay faults. A defect type of growing importance is bridging defects. A diagnostic technique is proposed for diagnosis of bridge faults that can potentially cause delay failure. As a result of the greater densities and more aggressive clocking strategies, FPGAs have become more susceptible to delay faults. An FPGA differs from a general integrated circuit in its capability for reconfiguration of the logic in the circuit-under-test (CUT). This unique feature is exploited in a very systematic and efficient way in the proposed method to arrive at a more precise set of suspects.
The latter part of this dissertation focuses on fault diagnosis techniques for built-in-self-test (BIST) environment. Diagnosis in a BIST environment adds an extra level of difficulty in comparison to diagnosis in a non-BIST environment. This is because it is first necessary to find out from the collected information which scan-elements have captured faulty responses and which vectors have produced a faulty response. This dissertation proposes a robust and low hardware overhead technique that can identify any number of failing scan cells. Finally, a novel technique for diagnosis is presented that allows non-adaptive identification of a subset of the failing test vectors. Innovative pruning techniques are used to efficiently extract information. While not all the failing BIST test vectors can be identified, results indicate that a significant number of them can be. This additional information allows faster and more precise diagnosis.
This paper presents a 3-stage continuous-flow linear decompression scheme for scan vectors that uses a variable number of bits to encode each vector. By using 3-stages of decompression, it can efficiently compress any test cube (i.e., deterministic test vector where the unassigned bit positions are left as don't cares) regardless of the number of specified (care) bits. As a result of this feature, there is no need for any constraints on the automatic test generation process (ATPG). Any ATPG can be used with any amount of static or dynamic compaction. Experimental results are shown which demonstrate that the proposed scheme achieves extremely high encoding efficiency.
Logic obfuscation can protect designs from reverse engineering and IP piracy. In this paper, a new attack strategy based on applying brute force iteratively to each logic cone is described and shown to significantly reduce the number of brute force key combinations that need to be tried by an attacker. It is shown that inserting key gates based on MUXes is an effective approach to increase security against this type of attack. Experimental results are presented quantifying the threat posed by this type of attack along with the relative effectiveness of MUX key gates in countering it.
This paper describes the fault-tolerant computing research currently active at Stanford University’s Center for Reliable Computing. One focus is on tolerating hardware faults by means of software (software-implemented hardware fault tolerance). This work mainly targets faults caused by radiation induced upsets. An experiment evaluating the techniques that we have developed, is currently running on the ARGOS satellite. Another focus is on fault-tolerance techniques for adaptive computing systems implemented with field-programmable gate arrays (FPGAs).
In secure computing, sensitive data must be kept private by protecting it from being obtained by an attacker. Existing techniques for computing with encrypted data are either prohibitively expensive (e.g., fully homomorphic encryption) or only work for special cases. (e.g., only for linear circuits). This paper presents a lightweight methodology for computing with noise-obfuscated data by carefully selecting internal locations for noise cancellation in arbitrary logic circuits. Noise is inserted in the data before computation and then partially cancelled during the computation and fully cancelled at the outputs. While the proposed methodology does not provide the level of strong encryption that fully homomorphic encryption would provide, it has the advantage of being lightweight, easy to implement, and can be deployed with relatively minimal performance impact. A key idea in the proposed approach is to reduce the complexity of the noise cancellation logic by carefully selecting internal locations to do local noise canceling. This is done in a way that prevents more than one input per gate from propagating noise thereby avoiding the complexity that arises from reconvergent noise propagation paths. One important application of the proposed scheme is for protecting data inside a computing unit obtained from a third party IP provider where a hidden backdoor access mechanism or hardware Trojan could be maliciously inserted. Experimental results show that noise can be propagated to outputs with overheads ranging from (13%-56%).
This paper presents a procedure for synthesizing multilevel circuits with concurrent error detection. All errors caused by single stuck-at faults are detected using a parity-check code. The synthesis procedure (implemented in Stanford CRCs TOPS synthesis system) fully automates the design process, and reduces the cost of concurrent error detection compared with previous methods. An algorithm for selecting a good parity-check code for encoding the circuit outputs is described. Once the code has been selected, a new procedure called structure-constrained logic optimization is used to minimize the area of the circuit as much as possible while still using a circuit structure that ensures that single stuck-at faults cannot produce undetected errors. It is proven that the resulting implementation is path fault secure, and when augmented by a checker, forms a self-checking circuit. The actual layout areas required for self-checking implementations of benchmark circuits generated with the techniques described in this paper are compared with implementations using Berger codes, single-bit parity, and duplicate-and-compare. Results indicate that the self-checking multilevel circuits generated with the procedure described here are significantly more economical.
A new scheme for combinational linear expansion is proposed for decompression of scan vectors. It has the capability to adjust the width of the linear expansion each clock cycle. This eliminates the requirement that every scan bit-slice be in the output space of the linear decompressor. Depending on how specified the current bit-slice is, the decompressor may load all scan chains or may load only a subset of the scan chains. This provides the nice feature that any scan vector can be generated using the proposed scheme regardless of the number or distribution of the specified bits. Thus, the proposed scheme allows the use of any ATPG procedure without any constraints. Moreover, it allows greater compression to be achieved than fixed width expansion techniques since the ratio of the number of scan chains to the number of tester channels can be scaled much larger. A procedure for designing and optimizing the adjustable width decompression hardware and obtaining the compressed data is described. Experimental data indicates that the proposed scheme is simple yet very effective.