Challenges in Using ML for Networking Research: How to Label If You Must

2020 
Leveraging innovations in Machine Learning (ML) research is of great current interest to researchers across the sciences, including networking research. However, using ML for networking poses challenging new problems that have been responsible for slowing the pace of innovation and the adoption of ML in the networking domain. Among the main problems are a well-known lack of data in general and representative data in particular, an overall inability to label data at scale, unknown data quality due to differences in data collection strategies, and data privacy issues that are unique to network data. Motivated by these challenges, we describe the design of Emerge1, a novel framework to support efforts to dEmocratize the use of ML for nEtwoRkinG rEsearch. In particular, Emerge focuses on the problem of providing a low-cost, scalable, and high-quality methodology for labeling networking data. To illustrate the benefits of Emerge, we use publicly available network measurement datasets from Caida's Ark project and create and evaluate data labels for them in a programmable fashion.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    32
    References
    0
    Citations
    NaN
    KQI
    []