AniMimic: Imitating 3D Animation from Video Priors

CVPR 2026

Tianyi Xie^1,* Yunuo Chen^1,* Yaowei Guo^1,* Yin Yang² Bolei Zhou¹ Demetri Terzopoulos¹ Ying Jiang¹ Chenfanfu Jiang¹

¹UCLA, ²University of Utah
^*Indicates Equal Contribution

Paper

AniMimic generates realistic dynamics for objects with diverse geometries by optimizing joint articulations and material parameters using video-based model priors.

Abstract

Creating realistic 3D animation remains a time-consuming and expertise-dependent process, requiring manual rigging, keyframing, and fine-tuning of complex motions. Meanwhile, video diffusion models have recently demonstrated remarkable 2D motion imagination, generating dynamic and visually coherent motion from text or image prompts. However, their outputs lack explicit 3D structure and cannot be directly used for animation or simulation. We present AniMimic, a framework that animates static 3D meshes using motion priors learned from video diffusion models. Starting from an input mesh, AniMimic synthesizes a monocular animation video, automatically constructs a skeleton with skinning weights, and refines its joint parameters through differentiable rendering and video-based supervision. To further enhance realism, we integrate a differentiable simulation module that refines mesh deformation through physically grounded soft-tissue dynamics. Our method bridges the creativity of video diffusion and the structural control of 3D rigged animation, producing physically plausible, temporally coherent, and artist-editable motion sequences that integrate seamlessly into standard animation pipelines.

Pipeline

From an input 3D mesh, we first render a canonical view and use a video diffusion model to generate a monocular motion sequence. We construct a skeleton with skinning weights using a feed-forward rigging model and generate animation by optimizing joint motions through differentiable rendering, tracking, and depth cues. Finally, we refine mesh deformation via differentiable simulation to obtain physically grounded and temporally consistent results. Right circles indicate novel views.

Qualitative Comparison

Reference

Ours

Puppeteer

DreamMesh

SC4D

Reference

Ours

Puppeteer

DreamMesh

SC4D

Reference

Ours

Puppeteer

DreamMesh

SC4D

Reference

Ours

Puppeteer

DreamMesh

SC4D

Reference

Ours

Puppeteer

DreamMesh

SC4D

Reference

Ours

Puppeteer

DreamMesh

SC4D

Reference

Ours

Puppeteer

DreamMesh

SC4D

Reference

Ours

Puppeteer

DreamMesh

SC4D

Animation from Novel Views

BibTeX

@inproceedings{xie2026animimic,
  title={AniMimic: Imitating 3D Animation from Video Priors},
  author={Xie, Tianyi and Chen, Yunuo and Guo, Yaowei and Yang, Yin and Zhou, Bolei and Terzopoulos, Demetri and Jiang, Ying and Jiang, Chenfanfu},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={40266--40276},
  year={2026}
}