Online Adjustment of Two-stage Inference for Knowledge Caching

2019 
With the rapid development of deep learning, many deep learning based smart services have emerged. These smart services usually consist of two components, the front-end user device and the back-end cloud server. The front-end device only collects queries from user and sends them to the server, and all operations of deep learning are computed in the server. This design has drawbacks of increasing load on the server and violating personal privacy. Knowledge caching is proposed in our prior work to mitigate these issues, which processes deep learning inference for frequently used queries of users in front-end devices.In this paper, in addition to cache a deep learning model for frequently used and privacy-related queries in the front-end device, we extend our prior work to implement the system on physical device and server. In particular, we design the online adjustment system for managing the status of devices and servers. This allows us to specify the process of caching the model, updating the cached model, and operating the entire system. We evaluate the system which is composed of NVIDIA Jetson Tegra X2 as the front-end device and the back-end server with TITAN Xp to confirm feasibility. As a result of evaluation, our new system shows better accuracy than the general model in the server.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    4
    References
    0
    Citations
    NaN
    KQI
    []