Moral Gridworlds: A Theoretical Proposal for Modeling Artificial Moral Cognition

2020 
I describe a suite of reinforcement learning environments in which artificial agents learn to value and respond to moral content and contexts. I illustrate the core principles of the framework by characterizing one such environment, or “gridworld,” in which an agent learns to trade-off between monetary profit and fair dealing, as applied in a standard behavioral economic paradigm. I then highlight the core technical and philosophical advantages of the learning approach for modeling moral cognition, and for addressing the so-called value alignment problem in AI.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    110
    References
    1
    Citations
    NaN
    KQI
    []