Cost Effective Troubleshooting of NFV Infrastructure

2020 
Fine-grained telemetry data enables troubleshooting teams to pinpoint the root cause of many faults in NFVI products. However, the volume of telemetry data rapidly accumulates, which makes it expensive to retain indefinitely. This work performs automatic analysis on the telemetry data, and only retains telemetry data if we suspect that it might contain a fault. Our evaluation uses 313 real application traces and shows that we can retain all faulty traces, and reduce the volume of stored data by over 99 %. We also show three case studies of real errors detected (and explained) by our system. Finally, we also deployed our system in a controlled testing environment and demonstrated its ability to detect microburst faults on a variety of network devices.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    5
    References
    0
    Citations
    NaN
    KQI
    []