Blockwise or Clustered Principal Component Analysis (CPCA) is commonly used to achieve real-time rendering of shadows and glossy reflections with precomputed radiance transfer (PRT). The vertices or pixels are partitioned into smaller coherent regions, and light transport in each region is approximated by a locally low-dimensional subspace using PCA. Many earlier techniques such as surface light field and reflectance field compression use a similar paradigm. However, there has been no clear theoretical understanding of how light transport dimensionality increases with local patch size, nor of the optimal block size or number of clusters. In this paper, we develop a theory of locally low dimensional light transport, by using Szego's eigenvalue theorem to analytically derive the eigenvalues of the covariance matrix for canonical cases. We show mathematically that for symmetric patches of area A , the number of basis functions for glossy reflections increases linearly with A , while for simple cast shadows, it often increases as √ A . These results are confirmed numerically on a number of test scenes. Next, we carry out an analysis of the cost of rendering, trading off local dimensionality and the number of patches, deriving an optimal block size. Based on this analysis, we provide useful practical insights for setting parameters in CPCA and also derive a new adaptive subdivision algorithm. Moreover, we show that rendering time scales sub-linearly with the resolution of the image, allowing for interactive all-frequency relighting of 1024 x 1024 images.
Imaging of objects under variable lighting directions is an important and frequent practice in computer vision, machine vision, and image-based rendering. Methods for such imaging have traditionally used only a single light source per acquired image. They may result in images that are too dark and noisy, e.g., due to the need to avoid saturation of highlights. We introduce an approach that can significantly improve the quality of such images, in which multiple light sources illuminate the object simultaneously from different directions. These illumination-multiplexed frames are then computationally demultiplexed. The approach is useful for imaging dim objects, as well as objects having a specular reflection component. We give the optimal scheme by which lighting should be multiplexed to obtain the highest quality output, for signal-independent noise. The scheme is based on Hadamard codes. The consequences of imperfections such as stray light, saturation, and noisy illumination sources are then studied. In addition, the paper analyzes the implications of shot noise, which is signal-dependent, to Hadamard multiplexing. The approach facilitates practical lighting setups having high directional resolution. This is shown by a setup we devise, which is flexible, scalable, and programmable. We used it to demonstrate the benefit of multiplexing in experiments.
A goal of image-based rendering is to synthesize as realistically as possible man made and natural objects. The paper presents a method for image-based modeling and rendering of objects with arbitrary (possibly anisotropic and spatially varying) BRDFs. An object is modeled by sampling the surface's incident light field to reconstruct a non-parametric apparent BRDF at each visible point on the surface, This can be used to render the object from the same viewpoint but under arbitrarily specified illumination. We demonstrate how these object models can be embedded in synthetic scenes and rendered under global illumination which captures the interreflections between real and synthetic objects. We also show how these image-based models can be automatically composited onto video footage with dynamic illumination so that the effects (shadows and shading) of the lighting on the composited object match those of the scene.
How do you tell a blackbird from a crow? There has been great progress toward automatic methods for visual recognition, including fine-grained visual categorization in which the classes to be distinguished are very similar. In a task such as bird species recognition, automatic recognition systems can now exceed the performance of non-experts - most people are challenged to name a couple dozen bird species, let alone identify them. This leads us to the question, "Can a recognition system show humans what to look for when identifying classes (in this case birds)?" In the context of fine-grained visual categorization, we show that we can automatically determine which classes are most visually similar, discover what visual features distinguish very similar classes, and illustrate the key features in a way meaningful to humans. Running these methods on a dataset of bird images, we can generate a visual field guide to birds which includes a tree of similarity that displays the similarity relations between all species, pages for each species showing the most similar other species, and pages for each pair of similar species illustrating their differences.
A new approach is introduced to 3-D parameterized object estimation and recognition. Though the theory is applicable for any parameterization, we use a model for which objects are approximated by patches of spheres, cylinders, and planes-primitive objects. These primitive surfaces are special cases of 3-D quadric surfaces. Primitive surface estimation is treated as parameter estimation using data patches in two or more noisy images taken by calibrated cameras in different locations and from different directions. Included is the case of a single moving camera. Though various techniques can be used to implement this nonlinear estimation, we discuss the use of gradient descent. Experiments are run and discussed for the case of a sphere of unknown location. It is shown that the estimation procedure can be viewed geometrically as a cross correlation of nonlinearly transformed image patches in two or more images. Approaches to object surface segmentation into primitive object surfaces, and primitive object-type recognition are briefly presented and discussed. The attractiveness of the approach is that maximum likelihood estimation and all the usual tools of statistical signal analysis can be brought to bear, the information extraction appears to be robust and computationally reasonable, the concepts are geometric and simple, and close to optimal accuracy should result.
Since antiquity, artisans have created flattened forms, often called "bas-reliefs,"-which give an exaggerated perception of depth when viewed from a particular vantage point. This paper presents an explanation of this phenomena, showing that the ambiguity in determining the relief of an object is not confined to bas-relief sculpture but is implicit in the determination of the structure of any object. Formally, if the object's true surface is denoted by z/sub true/=f(x, y), then we define the "generalized bas-relief transformation" as z=/spl lambda/f(x, y)+/spl mu/x+/spl nu/y, with a corresponding transformation of the albedo. For each image of a Lambertian surface f(x, y) produced by a point light source at infinity, there exists an identical image of a bas-relief produced by a transformed light source. This equality holds for both shaded and shadowed regions. Thus, the set of possible images (illumination cone) is invariant over generalized bas-relief transformations. When /spl mu/=/spl nu/=0 (e.g. a classical bas-relief sculpture), we show that the set of possible motion fields are also identical. Thus, neither small unknown motions nor changes of illumination can resolve the bas-relief ambiguity. Implications of this ambiguity on structure recovery and shape representation are discussed.
We address the problem of large-scale fine-grained visual categorization, describing new methods we have used to produce an online field guide to 500 North American bird species. We focus on the challenges raised when such a system is asked to distinguish between highly similar species of birds. First, we introduce "one-vs-most classifiers." By eliminating highly similar species during training, these classifiers achieve more accurate and intuitive results than common one-vs-all classifiers. Second, we show how to estimate spatio-temporal class priors from observations that are sampled at irregular and biased locations. We show how these priors can be used to significantly improve performance. We then show state-of-the-art recognition performance on a new, large dataset that we make publicly available. These recognition methods are integrated into the online field guide, which is also publicly available.