Moral Gridworlds: A Theoretical Proposal for Modeling Artificial Moral Cognition
2020
I describe a suite of reinforcement learning environments in which artificial agents learn to value and respond to moral content and contexts. I illustrate the core principles of the framework by characterizing one such environment, or “gridworld,” in which an agent learns to trade-off between monetary profit and fair dealing, as applied in a standard behavioral economic paradigm. I then highlight the core technical and philosophical advantages of the learning approach for modeling moral cognition, and for addressing the so-called value alignment problem in AI.
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
110
References
1
Citations
NaN
KQI