The RAS Implications of DIMM Connector Failure Rates in Large, Highly Available Server Systems

2007 
The juxtaposition of low-cost dual inline memory module (DIMM) connectors in highly reliable servers has created a difficult reliability, availability, and serviceability conundrum: the connector cost must be low enough to allow hundreds of sockets to be used per system, while at the same time, the system-level reliability must be high enough to prevent connector-related memory failures. This paper explores some of the modeling techniques that can be used to guide system-level fault tolerance decisions in view of the propensity of card-edge connectors to experience corrosion-induced failures, and it explains why understanding the probability density function (PDF) of the connector failure rate is crucial in establishing the system RAS strategy for DIMM connectors. The effects of both a "low" and "high" contact failure rate are analyzed at two different PDF's, and the resultant system implications are discussed.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    9
    References
    0
    Citations
    NaN
    KQI
    []