Code for paper "Deep Appearance Models for Face Rendering"

Code for paper "Deep Appearance Models for Face Rendering"
Abstract: We introduce a deep appearance model for rendering the human face. Inspired by Active Appearance Models, we develop a data-driven rendering pipeline that learns a joint representation of facial geometry and appearance from a multiview capture setup. Vertex positions and view-specific textures are modeled using a deep variational autoencoder that captures complex nonlinear effects while producing a smooth and compact latent representation. View-specific texture enables the modeling of view-dependent effects such as specularity. In addition, it can also correct for imperfect geometry stemming from biased or low resolution estimates. This is a significant departure from the traditional graphics pipeline, which requires highly accurate geometry as well as all elements of the shading model to achieve realism through physically-inspired light transport. Acquiring such a high level of accuracy is difficult in practice, especially for complex and intricate parts of the face, such as eyelashes and the oral cavity. These are handled naturally by our approach, which does not rely on precise estimates of geometry. Instead, the shading model accommodates deficiencies in geometry though the flexibility afforded by the neural network employed. At inference time, we condition the decoding network on the viewpoint of the camera in order to generate the appropriate texture for rendering. The resulting system can be implemented simply using existing rendering engines through dynamic textures with flat lighting. This representation, together with a novel unsupervised technique for mapping images to facial states, results in a system that is naturally suited to real-time interactive settings such as Virtual Reality (VR).

Multiface Dataset

Our dataset consists of high quality recordings of the faces of 13 identities, each captured in a multi-view capture stage performing various facial expressions. An average of 12,200 (v1 scripts) to 23,000 (v2 scripts) frames per subject with capture rate at 30 fps. Each frame includes roughly 40 (v1) to 160 (v2) different camera views under uniform illumination, yielding a total dataset size of 65TB. We provide the raw captured images from each camera view at a resolution of 2048 × 1334 pixels, tracked meshes including headposes, unwrapped textures at 1024 × 1024 pixels, metadata including intrinsic and extrinsic camera calibrations, and audio. This repository hosts the code of downloading the dataset and building a Codec Avatar using a deep appearance model. To learn more about how the dataset is captured and how different model architectures can influence performance, you may refer to our Technical Report.


  1. Features
  2. Installation
  3. Quick Start
  4. Works Using this Dataset
  5. Contributors
  6. Citation
  7. License



Quick Data Exploration

To download our data, first clone this repository and install dependencies

git clone
cd multiface
pip3 install -r requirements.txt

Since the full dataset takes terabytes of storage, one may wish to download partially. If you want to view the example assets, you may download the mini-dataset (< 1 GB)

python3 --dest "/path/to/mini_dataset/" --download_config "./mini_download_config.json"

The download_config argument points to the configuration file specifying assets to be downloaded, options include:

Variable Type Default
entity list of string All the entity will be downloaded
image boolean Raw images of enities selected will be downloaded
mesh boolean Tracked mesh of enities selected will be downloaded
texture boolean Unwrapped texture of enities selected will be downloaded
metadata boolean Metadata of enities selected will be downloaded
audio boolean Audio of enities selected will be downloaded
expression list of string All the facial expression (contains both v1 and v2 scripts) will be downloaded

The configuration to download all assets can be found at download_config.json.

Full Installation

To run training and render the 3D faces, please refer to our full installation guide.

Quick Start

To learn more on selecting model architecture, camera split and expression split for training and testing set, please refer to quick start.

Works Using this Dataset

Deep Appearance Models For Face Rendering Learning Compositional Radiance Fields of Dynamic Human Heads Pixel Codec Avatars
MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement Deep Incremental Learning for Efficient High-Fidelity Face Tracking Modeling Facial Geometry using Compositional VAEs
Strand-accurate Multi-view Hair Capture Mixture of Volumetric Primitives for Efficient Neural Rendering [Code Available] Human Hair Inverse Rendering using Multi-View Photometric data


Thanks to all the people who has helped generate and maintain this dataset!


If you use any data from this dataset or any code released in this repository, please cite the technical report (

  title={Multiface: A Dataset for Neural Face Rendering},
  author = {Wuu, Cheng-hsin and Zheng, Ningyuan and Ardisson, Scott and Bali, Rohan and Belko, Danielle and Brockmeyer, Eric and Evans, Lucas and Godisart, Timothy and Ha, Hyowon and Hypes, Alexander and Koska, Taylor and Krenn, Steven and Lombardi, Stephen and Luo, Xiaomin and McPhail, Kevyn and Millerschoen, Laura and Perdoch, Michal and Pitts, Mark and Richard, Alexander and Saragih, Jason and Saragih, Junko and Shiratori, Takaaki and Simon, Tomas and Stewart, Matt and Trimble, Autumn and Weng, Xinshuo and Whitewolf, David and Wu, Chenglei and Yu, Shoou-I and Sheikh, Yaser},
  doi = {10.48550/ARXIV.2207.11243},
  url = {}


Multiface is CC-BY-NC 4.0 licensed, as found in the LICENSE file.

[Terms of Use] [Privacy Policy]

Download Source Code

Download ZIP

Paper Preview

Jul 26, 2022