Overparameterization has been shown to benefit both the optimization and generalization of neural networks, but large networks are resource hungry at both training and test time. Network pruning can reduce test-time resource requirements, but is typically applied to trained networks and therefore cannot avoid the expensive training process. We aim to prune networks at initialization, thereby saving resources at training time as well. Specifically, we argue that efficient training requires preserving the gradient flow through the network. This leads to a simple but effective pruning criterion we term Gradient Signal Preservation (GraSP). We empirically investigate the effectiveness of the proposed method with extensive experiments on CIFAR-10, CIFAR-100, Tiny-ImageNet and ImageNet, using VGGNet and ResNet architectures. Our method can prune 80% of the weights of a VGG-16 network on ImageNet at initialization, with only a 1.6% drop in top-1 accuracy. Moreover, our method achieves significantly better performance than the baseline at extreme sparsity levels.
In the process of China's reform and opening up and modern socialist construction,Deng Xiaoping has carried out a deep exploration of the problems about social harmony and formed the thought of harmonious society.The points defining harmonious society include democracy and law-government as guarantee,justice and fairness as foundation,vigour and vitality as resource,and stability and order as symbol.
Deng Xiaoping's theory of anti-corruption is concise and thought-provoking,and it is of great realistic significance for the current work of fighting against corruption.When studying Deng's theory,we should get to know the danger of corruption so that we can be more determined in fulfilling the task,realize its complexity so that we can get rid of corruption by all means available,and realize that it is a long and hard task so that we can be mentally prepared.
The aim of this research was to evaluate the efficacy of the cystoscopic extraction and external drainage techniques for unsuccessful antegrade stenting in transplanted severe ureteral obstruction.A total of 26 patients with severe transplanted ureteral obstruction in whom the cystoscopic extraction technique and/or external drainage technique was performed were retrospectively evaluated. After the severe obstruction was successfully traversed, balloon dilatation followed by double-J stent insertion was performed.Of the 26 patients (male:female, 9:4; mean age, 38.1 years) who underwent failed ureteral stenting with the conventional procedure, 16 patients underwent successful stenting with the cystoscopic extraction technique, and 10 patients underwent successful stenting following external drainage. The mean serum creatinine of the 26 patients before stenting was 42.9 mg/dL (range, 32.7 to 54.1 mg/dL), which decreased to 10.3 mg/dL (range, 8.7 to 11.8 mg/dL) after stenting. The complications of the procedure were lower abdominal pain in 22 patients and gross hematuria in 9 patients. All complications were relieved with medical care within 3 to 5 days after the procedure. No major complications occurred.The cystoscopic extraction technique and external drainage technique are safe and useful for traversing a severe transplanted ureteral obstruction after a failed conventional procedure.
Skip connections and normalisation layers form two standard architectural components that are ubiquitous for the training of Deep Neural Networks (DNNs), but whose precise roles are poorly understood. Recent approaches such as Deep Kernel Shaping have made progress towards reducing our reliance on them, using insights from wide NN kernel theory to improve signal propagation in vanilla DNNs (which we define as networks without skips or normalisation). However, these approaches are incompatible with the self-attention layers present in transformers, whose kernels are intrinsically more complicated to analyse and control. And so the question remains: is it possible to train deep vanilla transformers? We answer this question in the affirmative by designing several approaches that use combinations of parameter initialisations, bias matrices and location-dependent rescaling to achieve faithful signal propagation in vanilla transformers. Our methods address various intricacies specific to signal propagation in transformers, including the interaction with positional encoding and causal masking. In experiments on WikiText-103 and C4, our approaches enable deep transformers without normalisation to train at speeds matching their standard counterparts, and deep vanilla transformers to reach the same performance as standard ones after about 5 times more iterations.
Deng Xiaoping's exposition on harmonious society contains dialectical thinking,reflects the dialectic unity of democracy and law,fairness and efficiency,reform and stability,human and nature,etc,which has created Deng Xiaoping Thought with dialectical features about harmonious society.
Increasing the batch size is a popular way to speed up neural network training, but beyond some critical batch size, larger batch sizes yield diminishing returns. In this work, we study how the critical batch size changes based on properties of the optimization algorithm, including acceleration and preconditioning, through two different lenses: large scale experiments, and analysis of a simple noisy quadratic model (NQM). We experimentally demonstrate that optimization algorithms that employ preconditioning, specifically Adam and K-FAC, result in much larger critical batch sizes than stochastic gradient descent with momentum. We also demonstrate that the NQM captures many of the essential features of real neural network training, despite being drastically simpler to work with. The NQM predicts our results with preconditioned optimizers, previous results with accelerated gradient descent, and other results around optimal learning rates and large batch training, making it a useful tool to generate testable predictions about neural network optimization.
To evaluate the effectiveness of high tibial osteotomy (HTO) assisted by three-dimensional (3-D) printing technology for correction of varus knee with osteoarthritis.Between January 2014 and June 2015, 16 patients (20 knees) with varus knee and osteoarthritis underwent HTO assisted by 3-D printing technology; a locking compression plate was used for internal fixation after HTO. There were 6 males and 10 females, aged 30-60 years (mean, 45.5 years). The disease duration was 1-10 years (mean, 6.2 years). The unilateral knee was involved in 12 cases and bilateral knees in 4 cases. According to Koshino's staging system, 3 knees were classified as stage I, 7 knees as stage II, 8 knees as stage III, and 2 knees as stage IV. Preoperative Hospital for Special Surgery (HSS) knee score was 63.8 ± 2.2; the femorotibial angle was (184.8 ± 2.9)°; and Insall-Salvati index was 1.03 ± 0.13.All the wounds healed primarily, and no complication of infection, osteofacial compartment syndrom, or deep vein thrombosis was observed. All patients were followed up 6-18 months (mean, 12.6 months). Personal paralysis was observed in 1 case (1 knee), and was cured after expectant treatment. Bone union time was 2.7-3.4 months (mean, 2.9 months). At 6 months after operation, the femorotibial angle was (173.8 ± 2.0)°, showing significant difference when compared with preoperative one (t = 11.70, P = 0.00); Insall-Salvati index was 1.04 ± 0.12, showing no significant difference when compared with preoperative one (t = -0.20, P = 0.85); and HSS knee score was significantly increased to 88.9 ± 3.1 (t = -25.44, P = 0.00). At last follow-up, the results were excellent in 13 knees, good in 6 knees, fair in 1 knee, and the excellent and good rate was 95%.3-D printing cutting block can greatly improve the accuracy of HTO, avoid repeated X-ray and multiple osteotomy, shorten the operation time, and ensure better effectiveness for correction of varus knee with osteoarthritis.
Model-based reinforcement learning (MBRL) is widely seen as having the potential to be significantly more sample efficient than model-free RL. However, research in model-based RL has not been very standardized. It is fairly common for authors to experiment with self-designed environments, and there are several separate lines of research, which are sometimes closed-sourced or not reproducible. Accordingly, it is an open question how these various existing MBRL algorithms perform relative to each other. To facilitate research in MBRL, in this paper we gather a wide collection of MBRL algorithms and propose over 18 benchmarking environments specially designed for MBRL. We benchmark these algorithms with unified problem settings, including noisy environments. Beyond cataloguing performance, we explore and unify the underlying algorithmic differences across MBRL algorithms. We characterize three key research challenges for future MBRL research: the dynamics bottleneck, the planning horizon dilemma, and the early-termination dilemma. Finally, to maximally facilitate future research on MBRL, we open-source our benchmark in http://www.cs.toronto.edu/~tingwuwang/mbrl.html.