POPCORN: Partially Observed Prediction Constrained Reinforcement Learning.

Joseph Futoma,Michael C. Hughes,Finale Doshi-Velez

POPCORN: Partially Observed Prediction Constrained Reinforcement Learning.

2020

Joseph Futoma
Michael C. Hughes
Finale Doshi-Velez

Many medical decision-making tasks can be framed as partially observed Markov decision processes (POMDPs). However, prevailing two-stage approaches that first learn a POMDP and then solve it often fail because the model that best fits the data may not be well suited for planning. We introduce a new optimization objective that (a) produces both high-performing policies and high-quality generative models, even when some observations are irrelevant for planning, and (b) does so in batch off-policy settings that are typical in healthcare, when only retrospective data is available. We demonstrate our approach on synthetic examples and a challenging medical decision-making problem.

Keywords:

Computer science
Reinforcement learning
Machine learning
Artificial intelligence

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations