Zwei: A Self-play Reinforcement Learning Framework for Video Transmission Services

2021 
Video transmission services adopt adaptive algorithms to ensure users' demands. Existing techniques are often optimized and evaluated by a function that linearly combines several weighted metrics. Nevertheless, we observe that the given function often fails to describe the requirement accurately, resulting in the violation of generating the required methods. We propose Zwei, a self-play reinforcement learning framework that updates the policy by straightforwardly utilizing the actual requirement. Technically, Zwei effectively rolls out the trajectories from the same initial state, and instantly estimate the win rate w.r.t the competition outcome, where the outcome represents which trajectory is closer to the assigned requirement. We evaluate Zwei with different requirements on various video transmission tasks, including adaptive bitrate streaming, crowd-sourced live streaming scheduling, and real-time communication. Results indicate that Zwei optimizes itself according to the assigned requirement faithfully, outperforming the state-of-the-art methods under all considered scenarios. Moreover, we further propose Zwei+, which enables Zwei to learn the policies in the vanilla no-regret reinforcement learning scenario. We validate Zwei+ in the adaptive bitrate streaming task and show the superiority of the proposed method over existing state-of-the-art approaches.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []