Face Anything: 4D Face Reconstruction from Any Image Sequence

Upload up to 40 face images (a short clip, named so they sort in order). The model jointly predicts depth and canonical facial coordinates in a single feed-forward pass, from which we derive canonical / depth / normal maps and dense, temporally-consistent 3D point tracks.

Project page · arXiv · Code

Inference mode
One-by-one: more surface detail, lower memory. Joint (all-at-once): more 3D-consistent across frames.
Crop each frame to a face-centred square (pixel3dmm-style) so the model focuses on the face. Uncheck for full frames.
Robust Video Matting (recommended).
252 1036

3D point cloud with colorful tracks · loads the whole sequence, then plays smoothly

Examples

NeRSemble 40 images

NeRSemble 1 image