Analyzing causes of failures in the Global Research Network using active measurements

2010 
With the objective to better understand how the global Internet should achieve an availability in the order of five nines, i.e. be available 0.99999 of the time, active measurements were performed between Norway and China through the Global Research Network. End-to-end downtime statistics was continuously collected during a 3-month period up to mid February 2010. In addition to periodically sending probe packets between the two measurement systems, traceroute was used every two minutes to identify an exact IP-level path between the end-points. Also, TTL (time-to-live) counter in the IP-header, which is reduced by one on every hop, was analyzed for each packet. Causes of the observed network failures based on the collected data were identified and insight is gained into processes preceding and following communication downtimes. We distinguish inter- and intradomain failures and, when possible, identify an exact link or an Autonomous System where a certain event has happened. The study shows that the end-to-end path availability is mainly affected by interdomain failures and long BGP convergence time as well as series of events not straight forwardly explained by the anticipated (re)routing behavior.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    20
    References
    1
    Citations
    NaN
    KQI
    []