TL;DR: An automatic, text-guided system that configures simulation parameters to produce realistic dynamic visuals.
Accurately simulating existing 3D objects and a wide variety of materials often demands expert knowledge and time-consuming physical parameter tuning to achieve the desired dynamic behavior. We introduce MotionPhysics, an end-to-end differentiable framework that infers plausible physical parameters from a user-provided natural language prompt for a chosen 3D scene of interest, removing the need for guidance from ground-truth trajectories or annotated videos. Our approach first utilizes a multimodal large language model to estimate material parameter values, which are constrained to lie within plausible ranges. We further propose a learnable motion distillation loss that extracts robust motion priors from pretrained video diffusion models while minimizing appearance and geometry inductive biases to guide the simulation. We evaluate MotionPhysics across more than thirty scenarios, including real-world, human-designed, and AI-generated 3D objects, spanning a wide range of materials such as elastic solids, metals, foams, sand, and both Newtonian and non-Newtonian fluids. We demonstrate that MotionPhysics produces visually realistic dynamic simulations guided by natural language, surpassing the state of the art while automatically determining physically plausible parameters.
Overview. MotionPhysics simulates physically consistent dynamics from text-guided input prompts by automatically estimating physical parameters for diverse input scenes, including AI-generated, real-world, and human-designed assets.
@InProceedings{motionphysics2026,
title = {MotionPhysics: Learnable Motion Distillation for Text-Guided Simulation},
author = {Miaowei Wang and Jakub Zadrożny and Oisin Mac Aodha and Amir Vaxman},
booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence},
year = {2026}
}