Meta-reinforcement learning in a thalamo-orbitofrontal circuit

2020 
Learning to predict rewards is essential for the survival of animals. Contemporary views suggest that such learning is driven by a reward prediction error, the difference between received and predicted rewards. Here we show using two-photon calcium imaging and optogenetics in mice that a different class of reward learning signals exists within the orbitofrontal cortex (OFC). Specifically, the reward responses of many OFC neurons exhibit plasticity consistent with filtering out rewards that are less salient for learning (such as predicted rewards, or, unpredicted rewards available in a context containing highly salient aversive stimuli). We show using quasi-simultaneous imaging and optogenetics that this reward response plasticity is sculpted by medial thalamic inputs to OFC. These results provide a biological substrate for emerging theoretical views of meta-reinforcement learning in prefrontal cortex.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    77
    References
    3
    Citations
    NaN
    KQI
    []