Unsupervised Discovery of 3D Physical Objects from Video

Yilun Du,Kevin Smith,Tomer Ulman,Joshua B. Tenenbaum,Jiajun Wu

Unsupervised Discovery of 3D Physical Objects from Video

2020

Yilun Du
Kevin Smith
Tomer Ulman
Joshua B. Tenenbaum
Jiajun Wu

We study the problem of unsupervised physical object discovery. Unlike existing frameworks that aim to learn to decompose scenes into 2D segments purely based on each object's appearance, we explore how physics, especially object interactions, facilitates learning to disentangle and segment instances from raw videos, and to infer the 3D geometry and position of each object, all without supervision. Drawing inspiration from developmental psychology, our Physical Object Discovery Network (POD-Net) uses both multi-scale pixel cues and physical motion cues to accurately segment observable and partially occluded objects of varying sizes, and infer properties of those objects. Our model reliably segments objects on both synthetic and real scenes. The discovered object properties can also be used to reason about physical events.

Keywords:

motion cues
Pixel
Artificial intelligence
3d geometry
Property (programming)
Observable
Pattern recognition
Computer science

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations