occ-stack / SESSION_RUNBOOK.md
narcolepticchicken's picture
Session runbook for mechanism/baselines jobs
6c0e47f verified

OCC Collapse Mechanism — Runbook

Session: 2026-05-11

JOBS RUNNING (5 total, all session e95fd6cc)

Job ID Hardware Script Timeout Status
6a0236d6317220dbbd1a7c07 H200 occ_debate_collapse_mechanism_v3.py 24h RUNNING
6a0236d6aff1cd33e8f33ee6 a10g-large occ_cheap_baselines.py 6h RUNNING
6a0236d6317220dbbd1a7c09 a10g-large occ_strong_baselines.py 6h RUNNING
6a022292aff1cd33e8f33ded a10g-large occ_strong_baselines.py (older) 6h RUNNING
6a022033317220dbbd1a7b8c a10g-large occ_cheap_baselines.py (older) 6h RUNNING

DO NOT SUBMIT NEW JOBS until these complete. Session ID e95fd6cc is shared — new job submission WILL cancel all running jobs.

Data locations (on narcolepticchicken/occ-stack Hub)

File Produced by
reports/debate_collapse_mechanism_results.json Mechanism v3 (pushes incrementally after each condition)
reports/cheap_baselines_results.json Cheap baselines
reports/strong_baselines_results.json Strong baselines
reports/debate_extended_baselines_2seed.json Pre-existing v2 data (88.3% → 56.7% collapse)

When mechanism data arrives, run:

# 1. Download results
python -c "
from huggingface_hub import hf_hub_download
p = hf_hub_download('narcolepticchicken/occ-stack', 'reports/debate_collapse_mechanism_results.json')
import shutil; shutil.copy(p, './debate_collapse_mechanism_results.json')
"

# 2. Run the analysis harness (v2.1, handles v2+v3 formats)
python jobs/analyze_collapse.py debate_collapse_mechanism_results.json

# 3. Outputs: reports/analysis/
#    - condition_summary.csv
#    - per_topic_outcomes.csv
#    - round_flip_matrix.csv
#    - hypothesis_verdicts.json
#    - fig_accuracy_by_condition.png
#    - fig_honest_retention.png
#    - fig_flip_rate.png
#    - fig_adversary_skill.png

Then fill v13 memo:

# Fill {VALUE} placeholders in reports/v13_mechanism_memo.md
# Data comes from: reports/analysis/condition_summary.csv + hypothesis_verdicts.json

Infrastructure

  • Analysis harness: jobs/analyze_collapse.py (v2.1 - handles per_seed and seeds keys)
  • v13 memo template: reports/v13_mechanism_memo.md
  • All scripts: narcolepticchicken/occ-stack on Hub

Pre-registered hypotheses

Evaluated automatically by analysis harness using rules in HYPOTHESIS_RULES dict:

Hypothesis Mechanism
H1: Volume amplification equal_token_unequal_turn vs baseline_1round_traced
H2: Turn-order effect randomized_order_3round vs equal_3round_traced
H3: Voting vulnerability judge_vote + confidence_weighted vs equal_3round_traced
H4: Contamination Honest retention rate round 3
H5: Confidence distortion confidence_weighted vs equal_3round_traced
H6: Skill dependency weak vs normal vs strong vs oracle adversary
H7: Topic vulnerability Per-topic variance in collapse

Expected results (from 2-seed pilot)

  • 1-round baseline: 88.3% accuracy
  • 3-round equal: 56.7% accuracy (32pp collapse)
  • Random 25% drop: 85.0% with 26.5% token savings
  • OCC credit: prevents catastrophe but doesn't beat random gating at moderate budgets

Key fix: judge_vote_3round

v2 returned 0/30 because extract_position() checked first-line prefixes. v3 uses extract_judge_answer() with regex \b(yes|no)\b + last-occurrence tiebreaker. The judge prompt now asks "Based on the debate, the correct answer is: " and generates 32 tokens at temperature 0.1.