Paul N. Whatmough

Qualcomm (United States)

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Field

Program Committee

Alaa R. Alameldeen Intel Mahdi Nazm Bojnordi Anirudh Badam Pradip Bose

10.1109/hpca.2019.00008

Cite

Citations (0)

AutoPilot: Automating SoC Design Space Exploration for SWaP Constrained Autonomous UAVs.

arXiv (Cornell University) (2021)

Srivatsan Krishnan Zishen Wan Kshitij Bhardwaj Paul N. Whatmough Aleksandra Faust

Building domain-specific accelerators for autonomous unmanned aerial vehicles (UAVs) is challenging due to a lack of systematic methodology for designing onboard compute. Balancing a computing system for a UAV requires considering both the cyber (e.g., sensor rate, compute performance) and physical (e.g., payload weight) characteristics that affect overall performance. Iterating over the many component choices results in a combinatorial explosion of the number of possible combinations: from 10s of thousands to billions, depending on implementation details. Manually selecting combinations of these components is tedious and expensive. To navigate the {cyber-physical design space} efficiently, we introduce \emph{AutoPilot}, a framework that automates full-system UAV co-design. AutoPilot uses Bayesian optimization to navigate a large design space and automatically select a combination of autonomy algorithm and hardware accelerator while considering the cross-product effect of other cyber and physical UAV components. We show that the AutoPilot methodology consistently outperforms general-purpose hardware selections like Xavier NX and Jetson TX2, as well as dedicated hardware accelerators built for autonomous UAVs, across a range of representative scenarios (three different UAV types and three deployment environments). Designs generated by AutoPilot increase the number of missions on average by up to 2.25x, 1.62x, and 1.43x for nano, micro, and mini-UAVs respectively over baselines. Our work demonstrates the need for holistic full-UAV co-design to achieve maximum overall UAV performance and the need for automated flows to simplify the design process for autonomous cyber-physical systems.

Autopilot

Payload (computing)

Cyber-physical system

Source

Cite

Citations (3)

A Wide Dynamic Range Sparse FC-DNN Processor with Multi-Cycle Banked SRAM Read and Adaptive Clocking in 16nm FinFET

ESSCIRC 2022- IEEE 48th European Solid State Circuits Conference (ESSCIRC) (2018)

Sae Kyu Lee Paul N. Whatmough Niamh Mulholland Patrick Hansen David Brooks

Always-on classifiers for sensor data require a very wide operating range to support a variety of real-time workloads and must operate robustly at low supply voltages. We present a 16nm always-on wake-up controller with a fully-connected (FC) Deep Neural Network (DNN) accelerator that operates from 0.4-1 V. Calibration-free automatic voltage/frequency tuning is provided by tracking small non-zero Razor timing-error rates, and a novel timing-error driven sync-free fast adaptive clocking scheme provides resilience to on-chip supply voltage noise. The model access burden of neural networks is relaxed using a multicycle SRAM read, which allows memory voltage to be reduced at iso-throughput. The wide operating range allows for high performance at 1.36GHz, low-power consumption down to 750μW and state-of-the-art raw efficiency at 16-bit precision of 750 GOPS/W dense, or 1.81 TOPS/W sparse.

sync

10.1109/esscirc.2018.8494245

Cite

Citations (4)

A 3mm² Programmable Bayesian Inference Accelerator for Unsupervised Machine Perception using Parallel Gibbs Sampling in 16nm

Glenn G. Ko Yuji Chai Marco Donato Paul N. Whatmough Thierry Tambe

This paper describes a 16nm programmable accelerator for unsupervised probabilistic machine perception tasks that performs Bayesian inference on probabilistic models mapped onto a 2D Markov Random Field, using MCMC. Exploiting two degrees of parallelism, it performs Gibbs sampling inference at up to 1380× faster with 1965× less energy than an Arm Cortex-A53 on the same SoC, and 1.5× faster with 6.3× less energy than an embedded FPGA in the same technology. At 0.8V, it runs at 450MHz, producing 44.6 MSamples/s at 0.88 nJ/sample.

Gibbs sampling

Markov random field

10.1109/vlsicircuits18222.2020.9162784

Cite

Citations (13)

Foundations of Deep Learning

Synthesis lectures on computer architecture (2017)

Brandon Reagen Robert Adolf Paul N. Whatmough Gu-Yeon Wei David J. Brooks

10.1007/978-3-031-01756-8_2

Cite

Citations (0)

A Literature Survey and Review

Synthesis lectures on computer architecture (2017)

Brandon Reagen Robert Adolf Paul N. Whatmough Gu-Yeon Wei David J. Brooks

10.1007/978-3-031-01756-8_5

Cite

Citations (0)

Mobile Machine Learning Hardware at ARM: A Systems-on-Chip (SoC) Perspective

arXiv (Cornell University) (2018)

Yuhao Zhu Matthew Mattina Paul N. Whatmough

Machine learning is playing an increasingly significant role in emerging mobile application domains such as AR/VR, ADAS, etc. Accordingly, hardware architects have designed customized hardware for machine learning algorithms, especially neural networks, to improve compute efficiency. However, machine learning is typically just one processing stage in complex end-to-end applications, involving multiple components in a mobile Systems-on-a-chip (SoC). Focusing only on ML accelerators loses bigger optimization opportunity at the system (SoC) level. This paper argues that hardware architects should expand the optimization scope to the entire SoC. We demonstrate one particular case-study in the domain of continuous computer vision where camera sensor, image signal processor (ISP), memory, and NN accelerator are synergistically co-designed to achieve optimal system-level efficiency.

Scope (computer science)

ARM architecture

10.48550/arxiv.1801.06274

Cite

Citations (18)

SCALE-Sim: Systolic CNN Accelerator.

arXiv (Cornell University) (2018)

Ananda Samajdar Yuhao Zhu Paul N. Whatmough Matthew Mattina Tushar Krishna

Systolic array

Source

Cite

Citations (60)

Ares

Brandon Reagen Udit Gupta Lillian Pentecost Paul N. Whatmough Sae Kyu Lee

As the use of deep neural networks continues to grow, so does the fraction of compute cycles devoted to their execution. This has led the CAD and architecture communities to devote considerable attention to building DNN hardware. Despite these efforts, the fault tolerance of DNNs has generally been overlooked. This paper is the first to conduct a large-scale, empirical study of DNN resilience. Motivated by the inherent algorithmic resilience of DNNs, we are interested in understanding the relationship between fault rate and model accuracy. To do so, we present Ares: a light-weight, DNN-specific fault injection framework validated within 12% of real hardware. We find that DNN fault tolerance varies by orders of magnitude with respect to model, layer type, and structure.

Resilience

Deep Neural Networks

Fraction (chemistry)

10.1145/3195970.3195997

Cite

Citations (215)

Deep Learning for Computer Architects

Synthesis lectures on computer architecture (2017)

Brandon Reagen Robert Adolf Paul N. Whatmough Gu-Yeon Wei David Brooks

Machine learning, and specifically deep learning, has been hugely disruptive in many fields of computer science. The success of deep learning techniques in solving notoriously difficult classification

10.1007/978-3-031-01756-8

Cite

Citations (26)