Proposal of Time-based evaluation for Universal Sensor Evaluation Index in Self-generation of Reward

2020 
Designing a reward function for Reinforcement Learning is tedious such that a new reward function needs to be uniquely designed for each environment. Self-Generation of Reward (SGR) solves this by making the agent creates its own reward from the changes in the surrounding, rather than being dependent to the reward produced by the environment. SGR achieved this by perceiving the changes using sensors, similar to how living things perceive the changes in environment in term of stimulus. The input from sensors are evaluated using Universal Sensor Evaluation Index (USEI), before converting it into rewards. Current USEI uses only strength and predictability evaluation, making the evaluation for danger detection inaccurate for certain environment. To create a more accurate evaluation, we proposed that time-based evaluation needs to be included in USEI. The performance for both previous and proposed evaluation index are tested using maze-like environment and Q-learning. Performances for both evaluation index are then compared against one another.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    8
    References
    0
    Citations
    NaN
    KQI
    []