Fast and accurate pre-routing timing prediction is crucial in the very-large-scale integration (VLSI) design flow. Existing machine learning (ML)-assisted pre-routing timing evaluators neglect the impact of timing optimization, which may render their approaches impractical in real circuit design flows. To model the impact of timing optimization, we propose an endpoint embedding framework that integrates netlist-layout information via multimodal fusion. An end-to-end flow is further developed for pre-routing restructure-tolerant prediction on global timing metrics. Comprehensive experiments on large-scale RISC-V designs with advanced 7-nm technology node demonstrate the superiority of our model compared to the SOTA pre-routing timing evaluators.
Optical proximity correction (OPC) is a widely-used resolution enhancement technique (RET) for printability optimization. Recently, rigorous numerical optimization and fast machine learning are the research focus of OPC in both academia and industry, each of which complements the other in terms of robustness or efficiency. We inspect the pattern distribution on a design layer and find that different sub-regions have different pattern complexity. Besides, we also find that many patterns repetitively appear in the design layout, and these patterns may possibly share optimized masks. We exploit these properties and propose a self-adaptive OPC framework to improve efficiency. Firstly we choose different OPC solvers adaptively for patterns of different complexity from an extensible solver pool to reach a speed/accuracy co-optimization. Apart from that, we prove the feasibility of reusing optimized masks for repeated patterns and hence, build a graph-based dynamic pattern library reusing stored masks to further speed up the OPC flow. Experimental results show that our framework achieves substantial improvement in both performance and efficiency.
In 3-D integrated circuits (3D-ICs), through silicon via (TSV) is a critical technique in providing vertical connections. However, the yield is one of the key obstacles to adopt the TSV-based 3D-ICs technology in industry. Various fault-tolerance structures using spare TSVs to repair faulty functional TSVs have been proposed in literature for yield and reliability enhancement, but a valid structure cannot always be found due to the lack of effective generation methods for fault-tolerance structures. In this paper, we focus on the problem of adaptive fault-tolerance structure (AFTS) generation. Given the relations between functional TSVs and spare TSVs, we first calculate the maximum number of tolerant faults in each TSV group. Then we propose an integer linear programming-based model to construct the AFTS with minimal multiplexer delay overhead and hardware cost. We further develop a speed-up technique through an efficient min-cost-max-flow model. All the proposed methodologies are embedded in a top-down TSV planning framework to form functional TSV groups and generate AFTSs. Experimental results show that, compared with state-of-the-art, the number of spare TSVs used for fault tolerance can be effectively reduced.
As minimum feature size and pitch spacing further decrease, triple patterning lithography (TPL) is a possible 193nm extension along the paradigm of double patterning lithography (DPL). However, there is very little study on TPL layout decomposition. In this paper, we show that TPL layout decomposition is a more difficult problem than that for DPL. We then propose a general integer linear programming formulation for TPL layout decomposition which can simultaneously minimize conflict and stitch numbers. Since ILP has very poor scalability, we propose three acceleration techniques without sacrificing solution quality: independent component computation, layout graph simplification, and bridge computation. For very dense layouts, even with these speedup techniques, ILP formulation may still be too slow. Therefore, we propose a novel vector programming formulation for TPL decomposition, and solve it through effective semidefinite programming (SDP) approximation. Experimental results show that the ILP with acceleration techniques can reduce 82% runtime compared to the baseline ILP. Using SDP based algorithm, the runtime can be further reduced by 42% with some tradeoff in the stitch number (reduced by 7%) and the conflict (9% more). However, for very dense layouts, SDP based algorithm can achieve 140× speed-up even compared with accelerated ILP.
Deep neural networks (DNNs) have achieved significant success in a variety of real world applications, i.e., image classification. However, tons of parameters in the networks restrict the efficiency of neural networks due to the large model size and the intensive computation. To address this issue, various approximation techniques have been investigated, which seek for a light weighted network with little performance degradation in exchange of smaller model size or faster inference. Both low-rankness and sparsity are appealing properties for the network approximation. In this paper we propose a unified framework to compress the convolutional neural networks (CNNs) by combining these two properties, while taking the nonlinear activation into consideration. Each layer in the network is approximated by the sum of a structured sparse component and a low-rank component, which is formulated as an optimization problem. Then, an extended version of alternating direction method of multipliers (ADMM) with guaranteed convergence is presented to solve the relaxed optimization problem. Experiments are carried out on VGG-16, AlexNet and GoogLeNet with large image classification datasets. The results outperform previous work in terms of accuracy degradation, compression rate and speedup ratio. The proposed method is able to remarkably compress the model (with up to 4.9x reduction of parameters) at a cost of little loss or without loss on accuracy.
For next-generation technology nodes, multiple patterning lithography (MPL) has emerged as a key solution, e.g., triple patterning lithography (TPL) for 14/11nm, and quadruple patterning lithography (QPL) for sub-10nm. In this paper, we propose a generic and robust layout decomposition framework for QPL, which can be further extended to handle any general K-patterning lithography (K>4). Our framework is based on the semidefinite programming (SDP) formulation with novel coloring encoding. Meanwhile, we propose fast yet effective coloring assignment and achieve significant speedup. To our best knowledge, this is the first work on the general multiple patterning lithography layout decomposition.
Camera-based 3D object detectors are welcome due to their wider deployment and lower price than LiDAR sensors. We first revisit the prior stereo detector DSGN for its stereo volume construction ways for representing both 3D geometry and semantics. We polish the stereo modeling and propose the advanced version, DSGN++, aiming to enhance effective information flow throughout the 2D-to-3D pipeline in three main aspects. First, to effectively lift the 2D information to stereo volume, we propose depth-wise plane sweeping (DPS) that allows denser connections and extracts depth-guided features. Second, for grasping differently spaced features, we present a novel stereo volume -- Dual-view Stereo Volume (DSV) that integrates front-view and top-view features and reconstructs sub-voxel depth in the camera frustum. Third, as the foreground region becomes less dominant in 3D space, we propose a multi-modal data editing strategy -- Stereo-LiDAR Copy-Paste, which ensures cross-modal alignment and improves data efficiency. Without bells and whistles, extensive experiments in various modality setups on the popular KITTI benchmark show that our method consistently outperforms other camera-based 3D detectors for all categories. Code is available at https://github.com/chenyilun95/DSGN2.
We synthesized a squaraine dye (F-0) to develop a method for detecting pyrophosphate (PPi) and alkaline phosphatase (ALP) by modulating the fluorescence of F-0. The fluorescence intensity of the F-0 system was quenched upon the addition of Cu2+ ions; however, it was restored when PPi was introduced due to the formation of a complex between PPi and Cu2+. Since ALP can hydrolyze PPi, the fluorescence of the system was quenched again upon the addition of ALP. Based on these principles, we established a fluorescent probe that exhibits an "off–on–off" fluorescence response. The detection limits of this method for PPi and ALP were 103 nmol dm−3 and 0.18 U dm−3, respectively. Moreover, this method demonstrates good selectivity and specificity and can be applied to the detection of PPi in actual samples.