A User Comfort Model and Index Policy for Personalizing Discrete Controller Decisions

2018 
User feedback allows for tailoring system operation to ensure individual user satisfaction. A major challenge in personalized decision-making is the systematic construction of a user model during operation while maintaining control performance. This paper presents both an index-based control policy to smartly collect and process user feedback and a user comfort model in the form of a Markov decision process with a priori unknown user-specific state transition probabilities. The control policy utilizes explicit user feedback to optimize a reward measure reflecting user comfort and addresses the explorationexploitation trade-off in a multi-armed bandit framework. The proposed approach combines restless bandits and upper confidence bound algorithms. It introduces an exploration term into the restless bandit formulation, utilizes user feedback to identify the user model, and is shown to be indexable. We demonstrate its capabilities with a simulation for learning a user's trade-off between comfort and energy usage.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    24
    References
    3
    Citations
    NaN
    KQI
    []