On-Policy Dataset Synthesis for Learning Robot Grasping Policies Using Fully Convolutional Deep Networks

2019 
Rapid and reliable robot grasping for a diverse set of objects has applications from warehouse automation to home de-cluttering. One promising approach is to learn deep policies from synthetic training datasets of point clouds, grasps, and rewards sampled using analytic models with stochastic noise models for domain randomization. In this letter, we explore how the distribution of synthetic training examples affects the rate and reliability of the learned robot policy. We propose a synthetic data sampling distribution that combines grasps sampled from the policy action set with guiding samples from a robust grasping supervisor that has full state knowledge. We use this to train a robot policy based on a fully convolutional network architecture that evaluates millions of grasp candidates in 4-DOF (3-D position and planar orientation). Physical robot experiments suggest that a policy based on fully convolutional grasp quality CNNs (FC-GQ-CNNs) can plan grasps in 0.625 s, considering 5000x more grasps than our prior policy based on iterative grasp sampling and evaluation. This computational efficiency improves rate and reliability, achieving 296 mean picks per hour (MPPH) compared to 250 MPPH for iterative policies. Sensitivity experiments explore the effect of supervisor guidance level and granularity of the policy action space. Code, datasets, videos, and supplementary material can be found at http://berkeleyautomation.github.io/fcgqcnn .
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    36
    References
    57
    Citations
    NaN
    KQI
    []