Effective thermal management for 3D integrated circuits (3D ICs) is becoming increasingly challenging due to the ever-increasing power density and chip design complexity; traditional heat sinks are expected to quickly reach their limits for meeting the cooling needs of 3D ICs. Alternatively, the integrated liquid-cooled microchannel heat sink has become one of the most effective solutions. In this paper, we present fast multigrid and block tridiagonally preconditioned graphics processing unit (GPU) based thermal simulation algorithms for 3D ICs. Unlike the CPU-based solver development in which existing sophisticated numerical simulation tools (matrix solvers) can be readily adopted and implemented, GPU-based thermal simulation demands more effort in the algorithm and data structure design phase, and requires careful consideration of GPU's thread/memory organization, data access/communication patterns, arithmetic intensity, as well as its hardware occupancies. As shown by various experimental results, our GPU-based 3D thermal simulation solvers can achieve more than 360× speedups over the best available direct solvers and more than 35× speedups over the CPU-based iterative solvers, without loss of accuracy.
Realizable power grid reduction becomes key to efficient design and verification of nowadays large-scale power delivery networks (PDNs). Existing state-of-the-art realizable reduction techniques for interconnect circuits, such as TICER algorithm, can not be well suited for effective power grid reductions, since reducing the mesh-structured power grids by TICER's nodal elimination scheme may introduce excessive number of new edges in the reduced grids that can be even harder to solve than the original grid due to the drastically increased sparse matrix density. In this work, we present a novel geometric template based reduction technique for reducing large-scale flip-chip power grids. Our method first creates geometric template according to the original power grid topology and then performs novel iterative grid corrections to improve the accuracy by matching the electrical behaviors of the reduced template grid with the original grid. Our experimental results show that the proposed reduction method can reduce industrial power grid designs by up to 95% with very satisfactory solution quality.
While effective thermal management for 3D-ICs is becoming increasingly challenging due to the ever increasing power density and chip design complexity, traditional heat sinks are expected to quickly reach their limits for meeting the cooling needs of 3D-ICs. Alternatively, integrated liquid-cooled microchannel heat sink becomes one of the most effective solutions. For the first time, we present fast GPU-based thermal simulation methods for 3D-ICs with integrated microchannel cooling. Based on the physical heat dissipation paths of 3D-ICs with integrated microchannels, we propose novel preconditioned iterative methods that can be efficiently accelerated on GPU's massively parallel computing platforms. Unlike the CPU-based solver development environment in which many existing sophisticated numerical simulation methods (matrix solvers) can be readily adopted and implemented, GPU-based thermal simulation demands more efforts in the algorithm and data structure design phase, and requires careful consideration of GPU's thread/memory organizations, data access/communication patterns, arithmetic intensity, as well as the hardware occupancies. As shown in various experimental results, our GPU-based 3D thermal simulation solvers can achieve up to 360X speedups over the best available direct solvers and more than 35X speedups compared with the CPU-based iterative solvers, without loss of accuracy.
Based on the current widespread use of underwater vehicles, a special type of underwater glider equipped with energy-saving diamond-structured rotatable wings that can improve the underwater vehicle's lift-drag ratio optimize hydrodynamic performance, and enhance flight quality was studied. With the support of adaptive meshing technology and Fluent's epsilon standard turbulence model, lift and drag forces were calculated when gliding steadily at 0.5 m/s under different angles of attack in the vertical plane. Compared to the traditional sweptback wing, this study obtained the best configuration scheme using a diamond-shaped glider with a wing tilt angle of 15° and aspect ratio of 3.69, effectively improving the lift-drag ratio by 7.14 %.
To improve the efficiency of direct solution methods in SPICE-accurate integrated circuit (IC) simulations, preconditioned iterative solution techniques have been widely studied in the past decades. However, it is still an extremely challenging task to develop robust yet efficient general-purpose preconditioning methods that can deal with various types of large-scale IC problems. In this paper, based on recent graph sparsification research we propose circuit-oriented general-purpose support-circuit preconditioning (GPSCP) methods to dramatically improve the sparse matrix solution time and reduce the memory cost during SPICE-accurate IC simulations. By sparsifying the Laplacian matrix extracted from the original circuit network using graph sparsification techniques, general-purpose support circuits can be efficiently leveraged as preconditioners for solving large Jacobian matrices through Krylov-subspace iterations. Additionally, a performance model-guided graph sparsification framework is proposed to help automatically build nearly-optimal GPSCP solvers. Our experiment results for a variety of large-scale IC designs show that the proposed preconditioning techniques can achieve up to 18× runtime speedups and 7× memory reduction in DC and transient simulations when compared to state-of-the-art direct solution methods.
Hypergraphs allow modeling problems with multiway high-order relationships. However, the computational cost of most existing hypergraph-based algorithms can be heavily dependent upon the input hypergraph sizes. To address the ever-increasing computational challenges, graph coarsening can be potentially applied for preprocessing a given hypergraph by aggressively aggregating its vertices (nodes). However, state-of-the-art hypergraph partitioning (clustering) methods that incorporate heuristic graph coarsening techniques are not optimized for preserving the structural (global) properties of hypergraphs. In this work, we propose an efficient spectral hypergraph coarsening scheme (HyperSF) for well preserving the original spectral (structural) properties of hypergraphs. Our approach leverages a recent strongly-local max-flow-based clustering algorithm for detecting the sets of hypergraph vertices that minimize ratio cut. To further improve the algorithm efficiency, we propose a divide-and-conquer scheme by leveraging spectral clustering of the bipartite graphs corresponding to the original hypergraphs. Our experimental results for a variety of hypergraphs extracted from real-world VLSI design benchmarks show that the proposed hypergraph coarsening algorithm can significantly improve the multi-way conductance of hypergraph clustering as well as runtime efficiency when compared with existing state-of-the-art algorithms.
Decoupling capacitor (decap) has been widely used to effectively reduce dynamic power supply noise. Traditional decap budgeting algorithms usually explore the sensitivity-based nonlinear optimizations or conjugate gradient (CG) methods, which can be prohibitively expensive for large-scale decap budgeting problems and cannot be easily parallelized. In this paper, we propose a hierarchical cross-entropy based optimization technique which is more efficient and parallel-friendly. Cross-entropy (CE) is an advanced optimization framework which explores the power of rare event probability theory and importance sampling. To achieve the high efficiency, a sensitivity-guided cross-entropy (SCE) algorithm is introduced which integrates CE with a partitioning-based sampling strategy to effectively reduce the solution space in solving the large-scale decap budgeting problems. Compared to improved CG method and conventional CE method, SCE with Latin hypercube sampling method (SCE-LHS) can provide 2× speedups, while achieving up to 25% improvement on power supply noise. To further improve decap optimization solution quality, SCE with sequential importance sampling (SCE-SIS) method is also studied and implemented. Compared to SCE-LHS, in similar runtime, SCE-SIS can lead to 16.8% further reduction on the total power supply noise.