Model-Driven Simulations for Computer Vision

2017 
There is a growing interest to utilize Computer Graphics (CG) renderings to generate large scale annotated data in order to train machine learning systems, such as Deep convolutional neural networks, for Computer Vision (CV). However, there has been a long debate on the usefulness of CG generated data for tuning CV systems (even from the 1980's). Especially, the impact of modeling errors and computational rendering approximations, due to choices in the rendering pipeline, on trained CV systems generalization performance is still not clear. In this paper, we take a case study in traffic scenario to empirically analyze the performance degradation when CV systems trained with virtual data are transferred to real data. We: a) discuss a generative model coupled with 3D CAD shapes for scene instance synthesis and, b) explore system performance tradeoffs due to the choice of rendering engine (e.g. Lambertian shader (LS), ray-tracing (RT), and Monte-carlo path tracing (MCPT)) and their respective parameters. DeepLab, that performs semantic segmentation, is chosen as the CV system being evaluated. In our case study, involving traffic scenes, when the CV system is trained with CG data samples (that use MCPT or RT) and augmented with only 10% of real-world training data from CityScapes dataset, the performance levels achieved are comparable to that of training DeepLab with the complete CityScapes dataset. Use of samples from LS degraded the performance of DeepLab by 20%. Physics-based MCPT rendering improved the performance by 6% but at the cost of more than 3 times the rendering time.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    29
    References
    8
    Citations
    NaN
    KQI
    []