Adaptive sample collection using active learning for kernel-based approximate policy iteration

Chunming Liu,Xin Xu,Haiyun Hu,Bin Dai

Adaptive sample collection using active learning for kernel-based approximate policy iteration

2011

Approximate policy iteration (API) has been shown to be a class of reinforcement learning methods with stability and sample efficiency. However, sample collection is still an open problem which is critical to the performance of API methods. In this paper, a novel adaptive sample collection strategy using active learning-based exploration is proposed to enhance the performance of kernel-based API. In this strategy, an online kernel-based least squares policy iteration (KLSPI) method is adopted to construct nonlinear features and approximate the Q-function simultaneously. Therefore, more representative samples can be obtained for value function approximation. Simulation results on typical learning control problems illustrate that by using the proposed strategy, the performance of KLSPI can be improved remarkably.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations