Reinforcement-learning agents with different temperature parameters explain the variety of human action-selection behavior in a Markov decision process task

Fumihiko Ishida,Takahiro Sasaki,Yutaka Sakaguchi,Hiroyuki Shimai

Reinforcement-learning agents with different temperature parameters explain the variety of human action-selection behavior in a Markov decision process task

2009

Fumihiko Ishida
Takahiro Sasaki
Yutaka Sakaguchi
Hiroyuki Shimai

We investigated the characteristics of the human action-selection in performing a Markov decision process (MDP) task, and compared them to those of reinforcement-learning (RL) agents. The behavior of human participants was roughly classified into two qualitatively different types. On the other hand, surprisingly, the variety of human behavior could be explained simply by a single parameter of the degree of randomness (i.e., the temperature parameter) in the action-selection rules of the RL agents. This result implies that the various behaviors of human action-selection may be determined by a simple mechanism in the brain.

Keywords:

Machine learning
Artificial intelligence
Reinforcement learning
Markov decision process
Action selection
Randomness
Partially observable Markov decision process
Pattern recognition
Mathematics
single parameter

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations