pi0.5 RLT Build Block Tower 6-Mix โ Joints-Only
RLT (RL Token) encoder-decoder trained on top of the joints-only block-tower baseline checkpoint (joints_only/49999), using the same 6-dataset mix with loss restricted to the first 7 joint dimensions.
Experiment
- Objective: Train RLT encoder-decoder with joints-only action supervision on the joints-only baseline.
- Weight init:
pravsels/build_block_tower_baseline_6mix_joints_onlycheckpoint49999(joints-only baseline). - Total steps: 50,000 (completed)
- Best val loss: 191.3 (step 45,000) โ published checkpoint
- Final train loss: 104.8 (step 49,900)
Config
- Config name:
pi05_rlt_build_block_tower_6mix_joints_only - Model:
Pi0RLConfig(pi0.5,action_horizon=50,rl_vla_loss_weight=0.0) - VLA backbone: frozen (encoder-decoder only)
- Batch size: 36
- Learning rate: 5e-5 cosine decay (1k warmup, 10k decay)
- Optimizer: AdamW (gradient clip norm 1.0)
- EMA decay: 0.999
- Delta actions: enabled
- Episode split: 90/10 train/val (seed=42)
- Action space: 17D canonical (first 7 joint dims active, remaining 10 EEF dims masked)
joints_only:True
Dataset
6 HuggingFace datasets: villekuosmanen/build_block_tower plus dAgger_build_block_tower_1.0.0 through 1.4.0 (340 episodes total).
Checkpoint Hashes
Verify integrity with:
cd checkpoints/<step> && find params -type f | sort | xargs sha256sum | sha256sum
| Step | Train Loss | Val Loss | SHA-256 |
|---|---|---|---|
| 45,000 | 108.7 | 191.3 | 75a3d6e1504ff4646f5276f02a42376a0c38db68d951c2da8c04eb212c6b63c6 |
W&B
Repo Structure
assets/ # Norm stats, per-timestep stats, episode split, valid indices
checkpoints/45000/params/ # Model weights (params only)
README.md # This file
TRAINING_LOG.md # Training log
Usage
from openpi.training.config import get_config
from openpi.serving.policy_server import PolicyServer
config = get_config("pi05_rlt_build_block_tower_6mix_joints_only")
server = PolicyServer(config, checkpoint_path="checkpoints/45000/params")