Bei Yu

Chinese University of Hong Kong

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Trends

Author Order

Document Type

Co-Authors

David Z. Pan

The University of Texas at Austin

107

Yuzhe Ma

Hong Kong University of Science and Technology

Yibo Lin

Peking University

Haoyu Yang

Nvidia (United States)

Tinghuan Chen

Chinese University of Hong Kong, Shenzhen

Hao Geng

Shenyang University of Technology

Qi Sun

Northeastern University

Evangeline F. Y. Young

Chinese University of Hong Kong

Jhih-Rong Gao

Cadence Design Systems (United States)

Zhuolun He

Chinese University of Hong Kong

Cooperative Institutions

Chinese University of Hong Kong

163

Chinese Academy of Sciences

University of Hong Kong

Tsinghua University

Peking University

University of Chinese Academy of Sciences

Zhejiang University

Fudan University

Sun Yat-sen University

Beihang University

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Field

Restructure-Tolerant Timing Prediction via Multimodal Fusion

Ziyi Wang Siting Liu Yuan Pu Song Chen Tsung-Yi Ho

Fast and accurate pre-routing timing prediction is crucial in the very-large-scale integration (VLSI) design flow. Existing machine learning (ML)-assisted pre-routing timing evaluators neglect the impact of timing optimization, which may render their approaches impractical in real circuit design flows. To model the impact of timing optimization, we propose an endpoint embedding framework that integrates netlist-layout information via multimodal fusion. An end-to-end flow is further developed for pre-routing restructure-tolerant prediction on global timing metrics. Comprehensive experiments on large-scale RISC-V designs with advanced 7-nm technology node demonstrate the superiority of our model compared to the SOTA pre-routing timing evaluators.

Netlist

Static timing analysis

10.1109/dac56929.2023.10247802

Cite

Citations (3)

AdaOPC: A Self-Adaptive Mask Optimization Framework For Real Design Patterns

arXiv (Cornell University) (2023)

Wenqian Zhao Xufeng Yao Ziyang Yu Guojin Chen Yuzhe Ma

Optical proximity correction (OPC) is a widely-used resolution enhancement technique (RET) for printability optimization. Recently, rigorous numerical optimization and fast machine learning are the research focus of OPC in both academia and industry, each of which complements the other in terms of robustness or efficiency. We inspect the pattern distribution on a design layer and find that different sub-regions have different pattern complexity. Besides, we also find that many patterns repetitively appear in the design layout, and these patterns may possibly share optimized masks. We exploit these properties and propose a self-adaptive OPC framework to improve efficiency. Firstly we choose different OPC solvers adaptively for patterns of different complexity from an extensible solver pool to reach a speed/accuracy co-optimization. Apart from that, we prove the feasibility of reusing optimized masks for repeated patterns and hence, build a graph-based dynamic pattern library reusing stored masks to further speed up the OPC flow. Experimental results show that our framework achieves substantial improvement in both performance and efficiency.

Robustness

Solver

Optical proximity correction

Design pattern

10.48550/arxiv.2303.12723

Cite

Citations (1)

Congestion-aware Global Routing using Deep Convolutional Generative Adversarial Networks

Zhonghua Zhou Ziran Zhu Jianli Chen Yuzhe Ma Bei Yu

The following topics are dealt with: learning (artificial intelligence); multiprocessing systems; optimisation; embedded systems; system-on-chip; neural nets; regression analysis; circuit optimisation; power aware computing; logic design.

10.1109/mlcad48534.2019.9142082

Cite

Citations (11)

Adaptive 3D-IC TSV Fault Tolerance Structure Generation

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2018)

Song Chen Qi Xu Bei Yu

In 3-D integrated circuits (3D-ICs), through silicon via (TSV) is a critical technique in providing vertical connections. However, the yield is one of the key obstacles to adopt the TSV-based 3D-ICs technology in industry. Various fault-tolerance structures using spare TSVs to repair faulty functional TSVs have been proposed in literature for yield and reliability enhancement, but a valid structure cannot always be found due to the lack of effective generation methods for fault-tolerance structures. In this paper, we focus on the problem of adaptive fault-tolerance structure (AFTS) generation. Given the relations between functional TSVs and spare TSVs, we first calculate the maximum number of tolerant faults in each TSV group. Then we propose an integer linear programming-based model to construct the AFTS with minimal multiplexer delay overhead and hardware cost. We further develop a speed-up technique through an efficient min-cost-max-flow model. All the proposed methodologies are embedded in a top-down TSV planning framework to form functional TSV groups and generate AFTSs. Experimental results show that, compared with state-of-the-art, the number of spare TSVs used for fault tolerance can be effectively reduced.

Spare part

10.1109/tcad.2018.2824284

Cite

Citations (11)

Layout decomposition for triple patterning lithography

International Conference on Computer Aided Design (2011)

Bei Yu Kun Yuan Boyang Zhang Duo Ding David Z. Pan

As minimum feature size and pitch spacing further decrease, triple patterning lithography (TPL) is a possible 193nm extension along the paradigm of double patterning lithography (DPL). However, there is very little study on TPL layout decomposition. In this paper, we show that TPL layout decomposition is a more difficult problem than that for DPL. We then propose a general integer linear programming formulation for TPL layout decomposition which can simultaneously minimize conflict and stitch numbers. Since ILP has very poor scalability, we propose three acceleration techniques without sacrificing solution quality: independent component computation, layout graph simplification, and bridge computation. For very dense layouts, even with these speedup techniques, ILP formulation may still be too slow. Therefore, we propose a novel vector programming formulation for TPL decomposition, and solve it through effective semidefinite programming (SDP) approximation. Experimental results show that the ILP with acceleration techniques can reduce 82% runtime compared to the baseline ILP. Using SDP based algorithm, the runtime can be further reduced by 42% with some tradeoff in the stitch number (reduced by 7%) and the conflict (9% more). However, for very dense layouts, SDP based algorithm can achieve 140× speed-up even compared with accelerated ILP.

Speedup

10.5555/2132325.2132327

Cite

Citations (91)

CBTune: Contextual Bandit Tuning for Logic Synthesis

Fangzhou Liu Zehua Pei Ziyang Yu Haisheng Zheng Zhuolun He

10.23919/date58400.2024.10546766

Cite

Citations (0)

A Unified Approximation Framework for Compressing and Accelerating Deep Neural Networks

arXiv (Cornell University) (2018)

Yuzhe Ma Ran Chen Wei Li Fanhua Shang Wenjian Yu

Deep neural networks (DNNs) have achieved significant success in a variety of real world applications, i.e., image classification. However, tons of parameters in the networks restrict the efficiency of neural networks due to the large model size and the intensive computation. To address this issue, various approximation techniques have been investigated, which seek for a light weighted network with little performance degradation in exchange of smaller model size or faster inference. Both low-rankness and sparsity are appealing properties for the network approximation. In this paper we propose a unified framework to compress the convolutional neural networks (CNNs) by combining these two properties, while taking the nonlinear activation into consideration. Each layer in the network is approximated by the sum of a structured sparse component and a low-rank component, which is formulated as an optimization problem. Then, an extended version of alternating direction method of multipliers (ADMM) with guaranteed convergence is presented to solve the relaxed optimization problem. Experiments are carried out on VGG-16, AlexNet and GoogLeNet with large image classification datasets. The results outperform previous work in terms of accuracy degradation, compression rate and speedup ratio. The proposed method is able to remarkably compress the model (with up to 4.9x reduction of parameters) at a cost of little loss or without loss on accuracy.

Speedup

Rank (graph theory)

Component (thermodynamics)

Deep Neural Networks

10.48550/arxiv.1807.10119

Cite

Citations (1)

Layout decomposition for quadruple patterning lithography and beyond

Bei Yu David Z. Pan

For next-generation technology nodes, multiple patterning lithography (MPL) has emerged as a key solution, e.g., triple patterning lithography (TPL) for 14/11nm, and quadruple patterning lithography (QPL) for sub-10nm. In this paper, we propose a generic and robust layout decomposition framework for QPL, which can be further extended to handle any general K-patterning lithography (K>4). Our framework is based on the semidefinite programming (SDP) formulation with novel coloring encoding. Meanwhile, we propose fast yet effective coloring assignment and achieve significant speedup. To our best knowledge, this is the first work on the general multiple patterning lithography layout decomposition.

Next-generation lithography

Speedup

10.1109/dac.2014.6881380

Cite

Citations (21)

DSGN++: Exploiting Visual-Spatial Relation for Stereo-based 3D Detectors

arXiv (Cornell University) (2022)

Yilun Chen Shijia Huang Shu Liu Bei Yu Jiaya Jia

Camera-based 3D object detectors are welcome due to their wider deployment and lower price than LiDAR sensors. We first revisit the prior stereo detector DSGN for its stereo volume construction ways for representing both 3D geometry and semantics. We polish the stereo modeling and propose the advanced version, DSGN++, aiming to enhance effective information flow throughout the 2D-to-3D pipeline in three main aspects. First, to effectively lift the 2D information to stereo volume, we propose depth-wise plane sweeping (DPS) that allows denser connections and extracts depth-guided features. Second, for grasping differently spaced features, we present a novel stereo volume -- Dual-view Stereo Volume (DSV) that integrates front-view and top-view features and reconstructs sub-voxel depth in the camera frustum. Third, as the foreground region becomes less dominant in 3D space, we propose a multi-modal data editing strategy -- Stereo-LiDAR Copy-Paste, which ensures cross-modal alignment and improves data efficiency. Without bells and whistles, extensive experiments in various modality setups on the popular KITTI benchmark show that our method consistently outperforms other camera-based 3D detectors for all categories. Code is available at https://github.com/chenyilun95/DSGN2.

Benchmark (surveying)

Stereo cameras

10.48550/arxiv.2204.03039

Cite

Citations (0)

Squaraine dye as a fluorescent probe for highly sensitive detection of pyrophosphate and alkaline phosphatase

Analytical Sciences (2024)

Wenxuan Zhu Shuhua Zhao Bei Yu Ye Tao Chaoyang Wang

We synthesized a squaraine dye (F-0) to develop a method for detecting pyrophosphate (PPi) and alkaline phosphatase (ALP) by modulating the fluorescence of F-0. The fluorescence intensity of the F-0 system was quenched upon the addition of Cu2+ ions; however, it was restored when PPi was introduced due to the formation of a complex between PPi and Cu2+. Since ALP can hydrolyze PPi, the fluorescence of the system was quenched again upon the addition of ALP. Based on these principles, we established a fluorescent probe that exhibits an "off–on–off" fluorescence response. The detection limits of this method for PPi and ALP were 103 nmol dm−3 and 0.18 U dm−3, respectively. Moreover, this method demonstrates good selectivity and specificity and can be applied to the detection of PPi in actual samples.

Alkaline hydrolysis

10.1007/s44211-024-00697-2

Cite

Citations (1)