Self-generation of reward based on sensor value -Improving reward accuracy by associating multiple sensors using Hebb’s rule-

2020 
Reinforcement learning(RL) is a method in which an agent learns a desired behavior through interaction with the environment. The agent learns the action based on the reward. The reward given to the agent is designed by the person in advance. However, since it is necessary to redefine the reward every time the environment or purpose changes, it can’t adapt to various environments. Therefore, in the previous research, we have proposed a method of evaluating the input value of the sensor using a Universal Sensor Evaluation Index(USEI) that can be used in any environment, and generating self-reward based on the evaluation. However, in previous studies, in the method of evaluating only a single sensor, the input can be evaluated only by the method of directly receiving the input, and the information obtained based on other sensor information is ignored, so that it is generated by the method of the previous research. The accuracy of the reward is low. If the accuracy of the reward is low, the robot may broken because the danger cannot be recognized except by receiving dangerous input. In this study, we propose a method for generating highly accurate rewards by add relevance to individual sensors using Hebb’s rule, which is the plasticity of neuron synapses, and evaluating inputs using multiple sensors. By using the proposed method, the inputs can be evaluated in consideration of multiple sensor inputs, and the range of danger recognition can be broadened, and implementation of danger prediction can be expected.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    3
    References
    0
    Citations
    NaN
    KQI
    []