HeadCraft: Modeling High-Detail Shape Variations for Animated 3DMMs

Luisa Verdoliva

University of Naples Federico II, TUM

We present HeadCraft, a generative model for highly-detailed human heads, ready for animation. Our method is trained on 2D displacement maps collected by registering a parametric template head with free surface displacements to a large set of 3D head scans. The resulting model is highly versatile, which can be demonstrated by fitting the latent code of the model to an arbitrary depth observation.

Click and rotate with the mouse. Press G to toggle wireframe, R to reset view.


Current advances in human head modeling allow to generate plausible-looking 3D head models via neural representations, such as NeRFs and SDFs. Nevertheless, constructing complete high-fidelity head models with explicitly controlled animation remains an issue. Furthermore, completing the head geometry based on a partial observation, e.g. coming from a depth sensor, while preserving a high level of detail is often problematic for the existing methods.

We introduce a generative model for detailed 3D head meshes on top of an articulated 3DMM which allows explicit animation and high-detail preservation at the same time. Our method is trained in two stages. First, we register a parametric head model with vertex displacements to each mesh of the recently introduced NPHM dataset of accurate 3D head scans. The estimated displacements are baked into a hand-crafted UV layout. Second, we train a StyleGAN model in order to generalize over the UV maps of displacements, which we later refer to HeadCraft. The decomposition of the parametric model and high-quality vertex displacements allows us to animate the model and modify the regions semantically. We demonstrate the results of unconditional generation and fitting to the full or partial observation.


Interpolation in the Latent Space

Here is an interactive viewer allowing for latent interpolation. Generated displacements are applied to the same FLAME template here. Drag the blue cursor around to linearly interpolate between four different latents. The resulting geometry from three views is displayed on the right.

Latent Shape Coordinates
(Quadrilateral linear interpolation between 4 cornering latents.)
Corresponding geometry from three views (frontal, side, top).

Method Overview

In the registration stage, we (a) fit the FLAME template by the face landmarks to the scan from the NPHM dataset and highly subdivide it, (b) optimize for the vector displacements for vertices to fit the rough geometry with strong regularizations, (c) optimize for the scalar refinements of the displacements along the normal directions, and (d) bake the displacements into a UV offset map. To generalize over the UV offset maps, we train a StyleGAN2 model. After training, the offsets can be applied to an arbitrary FLAME template by subdividing it and (e) querying the generated UV offset map with the (u, v) locations of the FLAME vertices.

Related Links

For more work on similar tasks, please check out:

  • NPHM learns a parametric head model over a set of manually collected high-fidelity head scans. The head is represented as an SDF modeled as an ensemble of MLPs and can be controlled by identity and expression. MonoNPHM adds a color model and extends NPHM to monocular RGB images.
  • PanoHead extends EG3D with 360° head representation.
  • FaceVerse learns a neural face model over a set of collected high-quality RGB-D head scans.
  • Neural Head Avatars and ROME represent the head as a combination of FLAME and displacements to either fit it to a multi-view talking head video or to a single image.
  • BibTeX

      title={HeadCraft: Modeling High-Detail Shape Variations for Animated 3DMMs},
      author={Sevastopolsky, Artem and Grassal, Philip-William and Giebenhain, Simon and Athar, Shah{R}ukh and Verdoliva, Luisa and Nie{\ss}ner, Matthias},