RIBBON: cost-effective and qos-aware deep learning model inference using a diverse pool of cloud computing instances

2021 
Deep learning model inference is a key service in many businesses and scientific discovery processes. This paper introduces Ribbon, a novel deep learning inference serving system that meets two competing objectives: quality-of-service (QoS) target and cost-effectiveness. The key idea behind Ribbon is to intelligently employ a diverse set of cloud computing instances (heterogeneous instances) to meet the QoS target and maximize cost savings. Ribbon devises a Bayesian Optimization-driven strategy that helps users build the optimal set of heterogeneous instances for their model inference service needs on cloud computing platforms - and, Ribbon demonstrates its superiority over existing approaches of inference serving systems using homogeneous instance pools. Ribbon saves up to 16% of the inference service cost for different learning models including emerging deep learning recommender system models and drug-discovery enabling models.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    65
    References
    0
    Citations
    NaN
    KQI
    []