Face Anything: 4D Face Reconstruction from Any Image Sequence

Upload up to 40 face images (a short clip, named so they sort in order). The model jointly predicts depth and canonical facial coordinates in a single feed-forward pass, from which we derive canonical / depth / normal maps and dense, temporally-consistent 3D point tracks.

Project page · arXiv · Code

Input images (up to 40, in temporal order)

Preview

…or upload a video (its first 40 frames are used)

Inference mode

One-by-one: more surface detail, lower memory. Joint (all-at-once): more 3D-consistent across frames.

Joint One-by-one

Face crop

Crop each frame to a face-centred square (pixel3dmm-style) so the model focuses on the face. Uncheck for full frames.

Remove background

Robust Video Matting (recommended).

Processing resolution

Higher = more detail (and more memory). Multiples of 14.

252 1036

3D point cloud with colorful tracks · loads the whole sequence, then plays smoothly

Download point clouds (.zip: tracks/ + points/)

Surface-normal map

Log

Examples

NeRSemble 40 images

NeRSemble 1 image