Reliability Models for the Internet of Things: A Paradigm Shift

2014 
More than 50 billion devices are expected to be internet enabled by 2020 [1]. These devices, commonly referred to as the "Internet of Things" (IoT), are expected to become ubiquitous and involved in every aspect of life, ranging from wearable devices to sensors monitoring industrial processes. The networking equipment connecting these devices will need to seamlessly communicate with several different software platforms, with software continuously upgraded. In addition, these devices will be exposed to unprecedented, highly varying external stimuli: harsh thermal fluctuations, fluids, moisture, vibrations and shock. Networking products have traditionally been protected in data centers, where temperature, humidity and vibrations are well controlled, and software upgrades well managed with significant redundancy. Traditional networking devices are not designed for use in the unpredictable, varying environments that devices supporting the IoT ecosystem will endure. A paradigm shift is needed, to develop new methodologies to characterize and estimate the system level reliability of these devices. Traditional Telecom industry requirements for hardware reliability are "5 Nines": 99.999%, which translates to 5 minutes and 15 seconds of downtime in a year. Software reliability is typically expected to be 99.95%, which translates to 1 day, 19 hours and 48 minutes in a year of down time. The combined reliability of a system will need to incorporate hardware and software reliability and capture the interaction between the two. Moreover, the product can be designed to be "self-aware" such that it can adapt to changing use environments to maintain target reliability. In this paper, we will present a new methodology for estimating hardware and software reliability given uncertain use conditions, to derive probabilistic estimates for overall system reliability. The methodology is applied to illustrative case studies: estimating the impact of temperature variation on the reliability of two component types in a typical networking product: solder joint interconnects and fans. The methodology is then extended to software applications in a networking product, capturing the effects of distinct variables: interaction between hardware and software, resource consumption (memory, processing, graphics etc.) and the delay between software and hardware updates. Numerical Finite Element Models (FEM) are combined with statistical techniques and Monte Carlo simulations to develop a reliability prediction framework/approach. An analytical framework is outlined to capture the asynchronous variation of software upgrades with hardware changes. The models developed can then be used to perform sensitivity studies to determine which factors are most influential in degrading reliability, and rank ordering them. This in turn can help identify the specific issue (hardware component or software issue) to focus on, to meet target reliability goals. The models can also be used to derive the optimal design window, which engineers can then use to design their IoT products while ensuring their reliability targets are met. Engineers can determine for instance, the optimal frequency of firmware updates, improve resource allocation for reliability etc. Finally, an overall framework will be presented, on how this methodology can be extended to any electronic system in the IoT ecosystem and beyond.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    5
    References
    27
    Citations
    NaN
    KQI
    []