Overview
We present Virtual Fitting Room, a video generation framework for creating minute-scale try-on videos from a single user image, a target garment, and a reference motion. Unlike traditional try-on methods that produce short clips or static images, VFR supports arbitrarily long, high-resolution videos with natural user-garment interactions.
Our approach segments long sequences into overlapping clips and enforces local smoothness between adjacent segments while preserving global temporal consistency using a compact anchor video. This design enables stable synthesis over extended motions without relying on long training sequences.
The generated videos exhibit appearance consistency from all viewpoints, robust handling of self-occlusions and body interactions, and preservation of fine details such as accessories. As a by-product, VFR also supports free-viewpoint rendering, enabling novel camera angles from a single input image.
Various motions.
Demo Video
Result Gallery
Click on a video for detailed inputs.
Citation
Acknowledgements
We thank you and the other visitors for visiting our project page.