Equipping humanoid robots with versatile interaction skills typically requires either extensive task-specific policy training or explicit human-to-robot motion retargeting. However, learning-based policies are hindered by prohibitive data collection costs, limiting their scalability. Meanwhile, retargeting paradigms rely heavily on human-centric pose estimation (e.g., SMPL), which inevitably introduces the morphology gap. Such skeletal scale mismatches result in severe spatial misalignments when mapped to robots, compromising interaction success.
In this work, we propose Dream2Act, a robot-centric framework that enables zero-shot interaction through generative video synthesis. Given an image of the predefined robot and target object in third-person view, our framework leverages video generation models to envision videos in which the physical robot completes the task with spatially aligned, morphology-consistent motion.
We evaluate Dream2Act on the Unitree G1 across four categories of whole-body mobile interaction tasks. Dream2Act achieves an overall task success rate of 37.5%, compared to 0% for conventional retargeting pipelines, maintaining robot-consistent spatial alignment throughout execution and enabling reliable contact formation.
Qualitative comparison on diverse spatially-sensitive tasks: Ball Kicking, Box Hugging, Bag Punching, and Sofa Sitting.
GVHMR (Human)
Baseline
Seedance2.0
Dream2Act (Ours)
GVHMR (Human)
Baseline
Seedance2.0
Dream2Act (Ours)
GVHMR (Human)
Baseline
Seedance2.0
Dream2Act (Ours)
GVHMR (Human)
Baseline
Seedance2.0
Dream2Act (Ours)
@misc{xu2026morphologyconsistenthumanoidinteractionrobotcentric,
title={Morphology-Consistent Humanoid Interaction through Robot-Centric Video Synthesis},
author={Weisheng Xu and Jian Li and Yi Gu and Bin Yang and Haodong Chen and Shuyi Lin and Mingqian Zhou and Jing Tan and Qiwei Wu and Xiangrui Jiang and Taowen Wang and Jiawen Wen and Qiwei Liang and Jiaxi Zhang and Renjing Xu},
year={2026},
eprint={2603.19709},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2603.19709},
}