Carsten Stoll

Incremental Raycasting of Piecewise Quadratic Surfaces on the GPU

Carsten Stoll Stefan Gumhold Hans‐Peter Seidel

To overcome the limitations of triangle and point based surfaces several authors have recently investigated surface representations that are based on higher order primitives. Among these are MPU, SLIM surfaces, dynamic skin surfaces and higher order iso-surfaces. Up to now these representations were not suitable for interactive applications because of the lack of an efficient rendering algorithm. In this paper we close this gap for implicit surface representations of degree two by developing highly optimized GPU implementations of the raycasting algorithm. We investigate techniques for fast incremental raycasting and cover per fragment and per quadric backface culling. We apply the approaches to the rendering of SLIM surfaces, quadratic iso-surfaces over tetrahedral meshes and bilinear quadrilaterals. Compared to triangle based surface approximations of similar geometric error we achieve only slightly lower frame rates but with much higher visual quality due to the quadratic approximation power of the underlying surfaces

Quadrilateral

Quadric

Frame rate

10.1109/rt.2006.280225

Cite

Citations (19)

Speech Driven Tongue Animation

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

Salvador Medina Denis Tomè Carsten Stoll Mark Tiede Kevin G. Munhall

Advances in speech driven animation techniques allow the creation of convincing animations for virtual characters solely from audio data. Many existing approaches focus on facial and lip motion and they often do not provide realistic animation of the inner mouth. This paper addresses the problem of speech-driven inner mouth animation. Obtaining performance capture data of the tongue and jaw from video alone is difficult because the inner mouth is only partially observable during speech. In this work, we introduce a large-scale speech and mocap dataset that focuses on capturing tongue, jaw, and lip motion. This dataset enables research using data-driven techniques to generate realistic inner mouth animation from speech. We then propose a deep-learning based method for accurate and generalizable speech to tongue and jaw animation, and evaluate several encoder-decoder network architectures and audio feature encoders. We find that recent self-supervised deep learning based audio feature encoders are robust, generalize well to unseen speakers and content, and work best for our task. To demonstrate the practical application of our approach, we show animations on high-quality parametric 3D face models driven by the landmarks generated from our speech-to-tongue animation method.

Feature (linguistics)

10.1109/cvpr52688.2022.01976

Cite

Citations (8)

ANR: Articulated Neural Rendering for Virtual Avatars

arXiv (Cornell University) (2020)

Amit Raj Julian Tanke James Hays Minh Vo Carsten Stoll

The combination of traditional rendering with neural networks in Deferred Neural Rendering (DNR) provides a compelling balance between computational complexity and realism of the resulting images. Using skinned meshes for rendering articulating objects is a natural extension for the DNR framework and would open it up to a plethora of applications. However, in this case the neural shading step must account for deformations that are possibly not captured in the mesh, as well as alignment inaccuracies and dynamics -- which can confound the DNR pipeline. We present Articulated Neural Rendering (ANR), a novel framework based on DNR which explicitly addresses its limitations for virtual human avatars. We show the superiority of ANR not only with respect to DNR but also with methods specialized for avatar creation and animation. In two user studies, we observe a clear preference for our avatar model and we demonstrate state-of-the-art performance on quantitative evaluation metrics. Perceptually, we observe better temporal stability, level of detail and plausibility.

Avatar

Deep Neural Networks

10.48550/arxiv.2012.12890

Cite

Citations (0)

Performance capture from sparse multi-view video

ACM Transactions on Graphics (2008)

Edilson de Aguiar Carsten Stoll Christian Theobalt Naveed Ahmed Hans‐Peter Seidel

This paper proposes a new marker-less approach to capturing human performances from multi-view video. Our algorithm can jointly reconstruct spatio-temporally coherent geometry, motion and textural surface appearance of actors that perform complex and rapid moves. Furthermore, since our algorithm is purely meshbased and makes as few as possible prior assumptions about the type of subject being tracked, it can even capture performances of people wearing wide apparel, such as a dancer wearing a skirt. To serve this purpose our method efficiently and effectively combines the power of surface- and volume-based shape deformation techniques with a new mesh-based analysis-through-synthesis framework. This framework extracts motion constraints from video and makes the laser-scan of the tracked subject mimic the recorded performance. Also small-scale time-varying shape detail is recovered by applying model-guided multi-view stereo to refine the model surface. Our method delivers captured performance data at high level of detail, is highly versatile, and is applicable to many complex types of scenes that could not be handled by alternative marker-based or marker-free recording techniques.

Motion Capture

Retargeting

10.1145/1360612.1360697

Cite

Citations (546)

Template based shape processing

Carsten Stoll

10.22028/d291-25966

Cite

Citations (0)

Estimating body shape of dressed humans

Computers & Graphics (2009)

Nils Hasler Carsten Stoll Bodo Rosenhahn Thorsten Thormählen Hans‐Peter Seidel

Body shape

Iterated function

Human-body model

Human body

10.1016/j.cag.2009.03.026

Cite

Citations (103)

TexMesh: Reconstructing Detailed Human Texture and Geometry from RGB-D Video

arXiv (Cornell University) (2020)

Tiancheng Zhi Christoph Lassner Tony Tung Carsten Stoll Srinivasa G. Narasimhan

We present TexMesh, a novel approach to reconstruct detailed human meshes with high-resolution full-body texture from RGB-D video. TexMesh enables high quality free-viewpoint rendering of humans. Given the RGB frames, the captured environment map, and the coarse per-frame human mesh from RGB-D tracking, our method reconstructs spatiotemporally consistent and detailed per-frame meshes along with a high-resolution albedo texture. By using the incident illumination we are able to accurately estimate local surface geometry and albedo, which allows us to further use photometric constraints to adapt a synthetically trained model to real-world sequences in a self-supervised manner for detailed surface geometry and high-resolution texture estimation. In practice, we train our models on a short example sequence for self-adaptation and the model runs at interactive framerate afterwards. We validate TexMesh on synthetic and real-world data, and show it outperforms the state of art quantitatively and qualitatively.

RGB color model

Albedo (alchemy)

10.48550/arxiv.2008.00158

Cite

Citations (4)

Performance capture from sparse multi-view video

Edilson de Aguiar Carsten Stoll Christian Theobalt Naveed Ahmed Hans‐Peter Seidel

This paper proposes a new marker-less approach to capturing human performances from multi-view video. Our algorithm can jointly reconstruct spatio-temporally coherent geometry, motion and textural surface appearance of actors that perform complex and rapid moves. Furthermore, since our algorithm is purely meshbased and makes as few as possible prior assumptions about the type of subject being tracked, it can even capture performances of people wearing wide apparel, such as a dancer wearing a skirt. To serve this purpose our method efficiently and effectively combines the power of surface- and volume-based shape deformation techniques with a new mesh-based analysis-through-synthesis framework. This framework extracts motion constraints from video and makes the laser-scan of the tracked subject mimic the recorded performance. Also small-scale time-varying shape detail is recovered by applying model-guided multi-view stereo to refine the model surface. Our method delivers captured performance data at high level of detail, is highly versatile, and is applicable to many complex types of scenes that could not be handled by alternative marker-based or marker-free recording techniques.

Motion Capture

Retargeting

10.1145/1399504.1360697

Cite

Citations (125)

Visualization with stylized line primitives

Carsten Stoll Stefan Gumhold Hans‐Peter Seidel

Line primitives are a very powerful visual attribute used for scientific visualization and in particular for 3D vector-field visualization. We extend the basic line primitives with additional visual attributes including color, line width, texture and orientation. To implement the visual attributes we represent the stylized line primitives as generalized cylinders. One important contribution of our work is an efficient rendering algorithm for stylized lines, which is hybrid in the sense that it uses both CPU and GPU based rendering. We improve the depth perception with a shadow algorithm. We present several applications for the visualization with stylized lines among which are the visualizations of 3D vector fields and molecular structures.

Stylized fact

Geometric primitive

Line (geometry)

Scientific visualization

10.1109/visual.2005.1532859

Cite

Citations (41)

Joint Estimation of Motion, Structure and Geometry from Stereo Sequences

Lecture notes in computer science (2010)

Levi Valgaerts Andrés Bruhn Henning Zimmer Joachim Weickert Carsten Stoll

Discontinuity (linguistics)

Structure from Motion