SimWeaver: Zero-Shot RGB Sim-to-Real for Deformable Manipulation

Wenkang Hu1 Haoran Wang1 Yitong Li1 Liu Liu2 Mengao Zhao2 Lai Jiang1 Xincheng Tang1 Junhang Wei3 Zhengjie Shu1 Zhendong Wang3 Zhizhong Su2 Huamin Wang3 Ruigang Yang1, †
1Shanghai Jiao Tong University 2Horizon Robotics 3Style3D Research

200 simulated demonstrations per task · no teleoperation · 91.30% real-world success across 5 deformable tasks.

Flatten Silk
95.65% real success 200 sim demos
Real Robot · Zero-Shot · No Fine-Tuning
TL;DR

Train deformable manipulation policies entirely in simulation, with no teleoperation, deployed zero-shot on real hardware.

200
simulated demos per task

No teleoperation at any stage. Single-seed deterministic synthesis.

91.30%
average real success · n=115 pooled

Across 5 deformable tasks. n=23 consecutive trials per task; Wilson 95% CI [84.7, 95.2].

100%
silk grasp under visual shift

Texture / lighting / rotation OOD. Real-data baseline drops to 13.0% / 69.6% / 8.7%.

$0.03
per usable trajectory

2824 trajectories per day on a single 8×RTX 4090 server — two orders of magnitude cheaper than real-robot collection.

Why this has been hard

Deformable sim-to-real has been blocked by three structural barriers.

BARRIER 01

Simulator unreliability

Cloth–rigid contact on thin fabrics breaks across Isaac Sim and VBD — parameter-sensitive instability, penetration, and non-deterministic replay.

BARRIER 02

Trajectory synthesis fails on deformables

High-DOF deformation breaks rigid trajectory transfer; deformable methods still need human teleop to start and ignore both arm and cloth constraints.

BARRIER 03

RGB sim-to-real has stalled

Pixel-based policies still don’t transfer reliably — depth and point-cloud sidestep the gap but collapse on dark, low-texture, and reflective fabrics.

Sim-to-Real Evidence

What the policy saw in simulation — and what it did on the real robot.

Left: the policy’s actual training observation (front oblique camera). Right: real-world execution with no fine-tuning. Per-task success rates from n=23 consecutive trials.

TASK 01

Silk Grasp

Bimanual grasp of thin reflective silk. Head-to-head vs. real-data baseline.
100.00% n=23
L wrist
R wrist
SIM · POLICY INPUT
REAL · ZERO-SHOT
TASK 02

Silk Unfolding

Recover flat layout from a draped configuration of reflective fabric.
95.65% n=23
L wrist
R wrist
SIM · POLICY INPUT
REAL · ZERO-SHOT
TASK 03

Garment Folding

Bimanual T-shirt full-fold with corner alignment.
91.30% n=23
L wrist
R wrist
SIM · POLICY INPUT
REAL · ZERO-SHOT
TASK 04

Snack Packaging

Insertion into a thin plastic bag — an open category in prior sim-to-real work.
86.96% n=23
L wrist
R wrist
SIM · POLICY INPUT
REAL · ZERO-SHOT
TASK 05

Garment Unfolding

Recover flat T-shirt layout from a wrinkled start.
82.61% n=23
L wrist
R wrist
SIM · POLICY INPUT
REAL · ZERO-SHOT
Qualitative rollout · five tasks
Five-task sim-to-real rollout keyframes

SAME POLICY CHECKPOINT · SIM ROLLOUT KEYFRAMES VS. ZERO-SHOT REAL DEPLOYMENT

System

Four components, one zero-shot pipeline.

SimWeaver system architecture: Asset, Sim, Trajectory Generation, Domain Randomization, VLA, Real Deployment
SimWeaver-Asset — simulation-ready deformable assets, extensible by generation
SimWeaver-Sim — stable, penetration-free rigid–soft contact
SimWeaver-Syn — topology-aware, teleoperation-free trajectory generation
SimWeaver-Real — the sim-to-real deployment protocol
Method

The infrastructure behind zero-shot deformable manipulation.

COMPONENT · ASSET

SimWeaver-Asset

An extensible deformable-asset framework — 6,000+ simulation-ready assets, growable by generation.

Large-Scale Asset Library

6,000+ simulation-ready meshes with grounded, interpretable physical parameters, spanning garments and deformable bags. Bags are a deformable category prior datasets miss.

Garments
Garment
Garment
Garment
Garment
Garment
Bags
Plastic bag
Paper bag
Canvas bag
Knit bag
Leather bag

Generative Extension

A single image becomes a simulation-ready 3D mesh — not just geometry, but the physical parameters that make it simulate.

Bag
Bag input photo
Image
3D mesh
Stretch
Bend
Density
Physics
Sim
T-shirt
T-shirt input photo
Image
3D mesh
Stretch
Bend
Density
Physics
Sim
COMPONENT · SIM

SimWeaver-Sim

Robust collision handling, penetration prevention, and trajectory-replay determinism — fixing the contact instability and non-deterministic replay that break thin fabrics in standard simulators.

Newton VBD  ·  Cloth penetration
FULL
ARM R
ARM L
Isaac Sim  ·  Grasp fails
FULL
ARM R
ARM L
SimWeaver-Sim  ·  Stable grasp
RELIABILITY ON BIMANUAL GARMENT GRASPING
SimulatorTask ↑Grasp ↑Pen. ↓Expl. ↓Per-step (ms) ↓
Isaac Sim0.0%0.0%0.0%0.0%7.80
Newton VBD0.0%100.0%77.5%22.5%10.38
SimWeaver100.0%100.0%0.0%0.0%4.44
Penetration (Pen.) / explosion (Expl.): physically invalid contact failures, lower is better.
Per-step time: lower = faster sim, more demos per hour.
COMPONENT · SYN

SimWeaver-Syn / TopoSynth

Topology-aware trajectory synthesis — deterministic demonstrations from a single seed.

No learned models

Pure topology graph + closed-form predicates — no prior to fit.

No teleoperation

Zero human demos at any stage of the pipeline.

No post-hoc filter

Single-seed synthesis — no over-generate-and-discard.

T-SHIRT FOLD · n=100
97.2%
Pass rate
100/100
Replay success
SYNTHESIS QUALITY · T-SHIRT FOLD · n=100
MethodPass rate ↑Replay 100× ↑
SIM1 (learned + filter)24.0%13 / 100
SimWeaver-Syn97.2%100 / 100

Replay 100× = one successful trajectory re-executed from 100 fresh simulator resets.

COMPONENT · REAL

SimWeaver-Real

A sim-to-real protocol that closes the deformable-specific gaps generic domain randomization leaves open.

AXIS 01 · INITIAL STATE

Physics-driven cloth init

drop-and-settle · pin-and-fold
AXIS 02 · IMAGE AUGMENTATION

Sensor-aware augmentation

real static_cam
BLUE
−12.6
static aug 0 static aug 1 static aug 2 static aug 3 static aug 4 static aug 5
real left_hand_cam
RED
+19.5
left aug 0 left aug 1 left aug 2 left aug 3 left aug 4 left aug 5
real right_hand_cam
FLICKER
+12.3
right aug 0 right aug 1 right aug 2 right aug 3 right aug 4 right aug 5

Removing this augmentation collapses real-world success to 0 % on all five tasks.

AXIS 03 · SCENE RANDOMIZATION

Lighting · background · robot pose

Hardware: bimanual Piper 6-DOF arms with parallel-jaw grippers · 1 overhead + 2 wrist-mounted RealSense D435i cameras.

Generalization · Silk Grasping

Sim-trained policies don’t just match real-data training — they surpass it under visual distribution shift.

Across texture, lighting, and rotation OOD shifts: real-robot teleop baseline (100 demos) drops to 13% / 70% / 9%. SimWeaver (200 sim demos + DR) holds at 100% on all three.

(A) SAMPLE EFFICIENCY

SimWeaver matches real-data efficiency.

Sim + DR scales as efficiently as teleop, and pushes further at the top end.

100% 50% 0% 25 50 100 200 demos 91.3% 100% Real Sim + DR
(B) DISTRIBUTION SHIFT

Real-data collapses under visual shifts.
SimWeaver doesn’t.

TEXTURE
13.0%
100%
LIGHTING
69.6%
100%
ROTATION
8.7%
100%
REAL
SIMWEAVER
SIDE-BY-SIDE · SAME TASK, OOD VISUAL SHIFT · 10× SPEED
SimWeaver · OOD · consistent success
Real-data baseline · OOD · fails

Both clips: real-robot silk grasping at 10× speed. Across texture, lighting, and rotation OOD shifts, SimWeaver (200 sim demos + DR) holds 100% success on all three; the real-data baseline (100 real-robot teleop demos) collapses to 13% / 70% / 9% success — full per-axis breakdown in chart above. Shown here: the texture OOD condition.

Why RGB

Point-cloud baselines collapse on dark, absorbing surfaces.

Black grippers and far-field dark surfaces absorb the projected pattern. Both D435i (active stereo) and Photoneo (industrial structured-light) drop large regions of the scan. Geometry-only policies have nothing to act on.

0 / 5
DP3 fails
all real tasks
Point-cloud baseline trained on the same 200 sim demos · zero deployed successfully on real hardware.
RealSense D435i point cloud failure on black gripper D435i RGB reference
PCD
RGB
D435i · consumer active-stereo
Photoneo industrial point cloud failure on dark surfaces Photoneo RGB reference
PCD
RGB
Photoneo · industrial structured-light

Top-left inset shows the RGB view of the same scene; the main image shows what the depth sensor actually captured. Black gripper bodies and far dark objects vanish in both consumer and industrial scanners — silk reflects diffusely but the gripper and table edges drop. Geometric policies act on these holes.

BibTeX
@misc{simweaver2026,
  title  = {SimWeaver: Zero-Shot RGB Sim-to-Real for Deformable Manipulation},
  author = {Wenkang Hu and Haoran Wang and Yitong Li and Liu Liu and Mengao Zhao and Lai Jiang and Xincheng Tang and Junhang Wei and Zhengjie Shu and Zhendong Wang and Zhizhong Su and Huamin Wang and Ruigang Yang},
  year   = {2026},
  note   = {Preprint}
}