SmolVLA — SO-101 Space Decluttering

SmolVLA policy fine-tuned on the SO-101 Space Decluttering Dataset v1 for language-conditioned pick-and-place decluttering tasks on a 6-DoF SO-101 robotic arm. Trained using LeRobot.

Training Details

  • Policy: SmolVLA (Vision-Language-Action)
  • Steps: 20,000
  • Robot: SO-101 6-DoF leader-follower
  • Cameras: Dual-view — fixed top-view + wrist-mounted egocentric
  • Framework: LeRobot
  • Language conditioning: Task descriptions passed as natural language instructions

Dataset

Trained on ShubhamK32/so101_declutter_v1 — a multi-view teleoperation dataset with spatial distractors injected to prevent visual shortcut learning.

Usage

from lerobot.policies.smolvla.modeling_smolvla import SmolVLAPolicy

policy = SmolVLAPolicy.from_pretrained("ShubhamK32/smolvla_so101_declutter")

Camera Views

  • observation.images.topview — Fixed overhead. Better for unoccluded pick-place tasks.
  • observation.images.wristview — Egocentric wrist-mounted. Better for overlapping and cluttered scenes.

Related

Downloads last month
3
Safetensors
Model size
0.5B params
Tensor type
F32
·
BF16
·
Video Preview
loading

Model tree for ShubhamK32/smolvla_so101_declutter

Dataset used to train ShubhamK32/smolvla_so101_declutter