Contrastive Goal Grouping for Policy Generalization in Goal-Conditioned Reinforcement Learning

2021 
We propose Contrastive Goal Grouping (COGOAL), a self-supervised goal embedding algorithm for learning a well-structured latent goal space to simplify goal-conditioned reinforcement learning. Compared to conventional reconstruction-based methods such as variational autoencoder, our approach can benefit from previously learnt goals and achieve better generalizability. More specifically, we theoretically prove a sufficient condition for determining whether goals share similar optimal policies, and propose COGOAL that groups goals satisfying the condition in the latent space via contrastive learning. The learnt goal embeddings enable a fully-trained policy for a goal to reach new goals which are adjacent in the latent space. We conduct experiments on visual navigation and visual object search tasks. COGOAL significantly outperforms the baseline methods in terms of sample efficiency in the visual object search task, in which a previously learnt policy is adaptively transferred to reach new goals with fine-tuning.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []