pi0.5 RLT Build Block Tower 6-Mix — Joints-Only

RLT (RL Token) encoder-decoder trained on top of the joints-only block-tower baseline checkpoint (joints_only/49999), using the same 6-dataset mix with loss restricted to the first 7 joint dimensions.

Experiment

Objective: Train RLT encoder-decoder with joints-only action supervision on the joints-only baseline.
Weight init: pravsels/build_block_tower_baseline_6mix_joints_only checkpoint 49999 (joints-only baseline).
Total steps: 50,000 (completed)
Best val loss: 191.3 (step 45,000) — published checkpoint
Final train loss: 104.8 (step 49,900)

Config

Config name: pi05_rlt_build_block_tower_6mix_joints_only
Model: Pi0RLConfig (pi0.5, action_horizon=50, rl_vla_loss_weight=0.0)
VLA backbone: frozen (encoder-decoder only)
Batch size: 36
Learning rate: 5e-5 cosine decay (1k warmup, 10k decay)
Optimizer: AdamW (gradient clip norm 1.0)
EMA decay: 0.999
Delta actions: enabled
Episode split: 90/10 train/val (seed=42)
Action space: 17D canonical (first 7 joint dims active, remaining 10 EEF dims masked)
joints_only: True

Dataset

6 HuggingFace datasets: villekuosmanen/build_block_tower plus dAgger_build_block_tower_1.0.0 through 1.4.0 (340 episodes total).

Checkpoint Hashes

Verify integrity with:

cd checkpoints/<step> && find params -type f | sort | xargs sha256sum | sha256sum

Step	Train Loss	Val Loss	SHA-256
45,000	108.7	191.3	`75a3d6e1504ff4646f5276f02a42376a0c38db68d951c2da8c04eb212c6b63c6`

W&B

Training dashboard

Repo Structure

assets/                        # Norm stats, per-timestep stats, episode split, valid indices
checkpoints/45000/params/      # Model weights (params only)
README.md                      # This file
TRAINING_LOG.md                # Training log

Usage

from openpi.training.config import get_config
from openpi.serving.policy_server import PolicyServer

config = get_config("pi05_rlt_build_block_tower_6mix_joints_only")
server = PolicyServer(config, checkpoint_path="checkpoints/45000/params")

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Robotics