Abstract
Creating realistic 3D animation remains a time-consuming and expertise-dependent process, requiring manual rigging, keyframing, and fine-tuning of complex motions. Meanwhile, video diffusion models have recently demonstrated remarkable 2D motion imagination, generating dynamic and visually coherent motion from text or image prompts. However, their outputs lack explicit 3D structure and cannot be directly used for animation or simulation. We present AniMimic, a framework that animates static 3D meshes using motion priors learned from video diffusion models. Starting from an input mesh, AniMimic synthesizes a monocular animation video, automatically constructs a skeleton with skinning weights, and refines its joint parameters through differentiable rendering and video-based supervision. To further enhance realism, we integrate a differentiable simulation module that refines mesh deformation through physically grounded soft-tissue dynamics. Our method bridges the creativity of video diffusion and the structural control of 3D rigged animation, producing physically plausible, temporally coherent, and artist-editable motion sequences that integrate seamlessly into standard animation pipelines.
Pipeline
From an input 3D mesh, we first render a canonical view and use a video diffusion model to generate a monocular motion sequence. We construct a skeleton with skinning weights using a feed-forward rigging model and generate animation by optimizing joint motions through differentiable rendering, tracking, and depth cues. Finally, we refine mesh deformation via differentiable simulation to obtain physically grounded and temporally consistent results. Right circles indicate novel views.
Qualitative Comparison
Reference
Ours
Puppeteer
DreamMesh
SC4D
Reference
Ours
Puppeteer
DreamMesh
SC4D
Reference
Ours
Puppeteer
DreamMesh
SC4D
Reference
Ours
Puppeteer
DreamMesh
SC4D
Reference
Ours
Puppeteer
DreamMesh
SC4D
Reference
Ours
Puppeteer
DreamMesh
SC4D
Reference
Ours
Puppeteer
DreamMesh
SC4D
Reference
Ours
Puppeteer
DreamMesh
SC4D
Animation from Novel Views
View 1
View 2
View 3
View 1
View 2
View 3
View 1
View 2
View 3
View 1
View 2
View 3
BibTeX
@inproceedings{xie2026animimic,
title={AniMimic: Imitating 3D Animation from Video Priors},
author={Xie, Tianyi and Chen, Yunuo and Guo, Yaowei and Yang, Yin and Zhou, Bolei and Terzopoulos, Demetri and Jiang, Ying and Jiang, Chenfanfu},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={40266--40276},
year={2026}
}