Structural image and video understanding

2016 
In this thesis, we have discussed how to exploit the structures in several computer vision topics. The five chapters addressed five computer vision topics using the image structures. In chapter 2, we proposed a structural model to jointly predict the age, expression and gender of a face. By modeling the facial regions with latent variables, we learn the relationship of different variables and improve the prediction accuracy of each tasks. In chapter 3, we proposed a framework to generate the 3D reconstruction from a single image. We firstly predict the geometrical structure of a scene, then generate the 3D layout of the image using the predicted geometrical structure as prior information. In chapter 4, we extract the primary object across videos. By building a graphical model on top of several videos, the primary object is extracted from each frame. In chapter 5, we using deep learning to learn the features and structures of images to predict the illuminant of scenes. Since deep learning benefits from large training dataset, we proposed a data augmentation method to generate large size of training images. In chapter 6, an image alignment algorithm is presented. In this work, we proposed a piecewise based algorithm to address the problem of aligning images with large view-point difference and non-planar assumption.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []