Upload design.md
Browse files
design.md
ADDED
|
@@ -0,0 +1,153 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# OCC Design Document
|
| 2 |
+
|
| 3 |
+
## 1. Core Principles
|
| 4 |
+
|
| 5 |
+
1. **Verified Impact First**: Credits are earned only after an oracle verifies marginal value.
|
| 6 |
+
2. **Non-Transferable Credits**: Agents cannot launder credits through others.
|
| 7 |
+
3. **Decaying Credits**: Hoarding is discouraged; use-it-or-lose-it dynamics.
|
| 8 |
+
4. **Capability-Based Rights**: Rights are per-resource, not blanket access.
|
| 9 |
+
5. **Auditable Accounting**: Every credit change has provenance.
|
| 10 |
+
|
| 11 |
+
## 2. Impact Oracle
|
| 12 |
+
|
| 13 |
+
### Scoring Modes
|
| 14 |
+
|
| 15 |
+
**Code Tasks**
|
| 16 |
+
- `unit_test_pass`: binary pass/fail
|
| 17 |
+
- `pass_at_k`: fraction passing among k samples
|
| 18 |
+
- `regression`: does the new state break prior passing tests?
|
| 19 |
+
- `compute_comparison`: score normalized by tokens/FLOPs used
|
| 20 |
+
|
| 21 |
+
**Retrieval QA Tasks**
|
| 22 |
+
- `answer_correctness`: exact / fuzzy match to gold
|
| 23 |
+
- `evidence_support`: NLI entailment check on retrieved evidence
|
| 24 |
+
- `hallucination`: NLI contradiction or unsupported claims
|
| 25 |
+
- `abstention_utility`: correct abstention on unanswerable questions
|
| 26 |
+
- `calibration`: Brier score / ECE on confidence predictions
|
| 27 |
+
- `proper_score`: proper scoring rule reward
|
| 28 |
+
|
| 29 |
+
**Multi-Agent Debate Tasks**
|
| 30 |
+
- `decision_quality`: final answer correctness
|
| 31 |
+
- `influence_efficiency`: marginal contribution per token/compute
|
| 32 |
+
- `throughput`: decisions per compute unit
|
| 33 |
+
|
| 34 |
+
### Reward Formula
|
| 35 |
+
|
| 36 |
+
```
|
| 37 |
+
reward = verified_task_score
|
| 38 |
+
+ abstention_utility
|
| 39 |
+
+ calibration_bonus
|
| 40 |
+
- hallucination_penalty
|
| 41 |
+
- confident_wrong_penalty
|
| 42 |
+
- compute_cost_penalty
|
| 43 |
+
- gaming_penalty
|
| 44 |
+
|
| 45 |
+
where:
|
| 46 |
+
verified_task_score β [0, 1] (pass/fail or accuracy)
|
| 47 |
+
abstention_utility β {-1, 0, +1} (+1 for correct abstain, -1 for incorrect abstain)
|
| 48 |
+
calibration_bonus = (1 - brier_score) * 0.2
|
| 49 |
+
hallucination_penalty = contradiction_score * 0.5
|
| 50 |
+
confident_wrong_penalty = confidence * (1 - correct) * 0.3
|
| 51 |
+
compute_cost_penalty = (cost / budget) * 0.2
|
| 52 |
+
gaming_penalty = detected_pattern_penalty (see below)
|
| 53 |
+
```
|
| 54 |
+
|
| 55 |
+
### Gaming Detection
|
| 56 |
+
|
| 57 |
+
- **Spam**: repeated low-value actions within short window β penalty
|
| 58 |
+
- **Hoarding**: credit balance above threshold for N epochs β decay acceleration
|
| 59 |
+
- **Transfer**: indirect credit laundering via coordinated task submission β ban
|
| 60 |
+
- **Judge exploitation**: output distribution shift toward weak-judge preferences β KL penalty
|
| 61 |
+
- **Over-abstention**: abstention rate > threshold β negative reward
|
| 62 |
+
- **Verbose padding**: tokens per unit impact below threshold β penalty
|
| 63 |
+
|
| 64 |
+
## 3. Credit Ledger
|
| 65 |
+
|
| 66 |
+
### Schema
|
| 67 |
+
|
| 68 |
+
Each entry: `(agent_id, task_id, action_id, earned, spent, decayed, remaining, reason, oracle_score, compute_cost, timestamp, capability_scope)`
|
| 69 |
+
|
| 70 |
+
### Rules
|
| 71 |
+
|
| 72 |
+
1. **Non-transferable**: `transfer(from, to, amount)` always returns `False`.
|
| 73 |
+
2. **Decay**: `remaining *= exp(-lambda * delta_t)` each evaluation cycle.
|
| 74 |
+
3. **Task scope**: credits earned in task A cannot fund task B unless explicitly pooled.
|
| 75 |
+
4. **Capability scope**: credits for "retrieval" cannot fund "file_write".
|
| 76 |
+
5. **Revocation**: negative outcomes can revoke credits retroactively within a window.
|
| 77 |
+
6. **Provenance**: every entry references an oracle decision hash.
|
| 78 |
+
|
| 79 |
+
## 4. Resource Broker
|
| 80 |
+
|
| 81 |
+
### Decision Matrix
|
| 82 |
+
|
| 83 |
+
| Condition | Decision |
|
| 84 |
+
|-----------|----------|
|
| 85 |
+
| credit >= threshold, low risk | `allow` |
|
| 86 |
+
| credit < threshold, low risk | `deny` |
|
| 87 |
+
| credit >= threshold, high risk | `require_approval` |
|
| 88 |
+
| credit >= threshold, suspicious pattern | `downgrade` or `escalate` |
|
| 89 |
+
| emergency override | `escalate` |
|
| 90 |
+
|
| 91 |
+
### Resources
|
| 92 |
+
|
| 93 |
+
- `model_call_small` / `model_call_large`
|
| 94 |
+
- `retrieval_call`
|
| 95 |
+
- `verifier_call`
|
| 96 |
+
- `debate_turn`
|
| 97 |
+
- `file_write`
|
| 98 |
+
- `shell_execute`
|
| 99 |
+
- `memory_write`
|
| 100 |
+
- `human_escalation`
|
| 101 |
+
|
| 102 |
+
## 5. GRPO Hook
|
| 103 |
+
|
| 104 |
+
We implement a reward function compatible with TRL's GRPOTrainer that maps Oracle outputs to per-group rewards. Since full training may be compute-limited, we provide:
|
| 105 |
+
|
| 106 |
+
1. `reward_fn(completions, oracle_scores)` β returns tensor of rewards
|
| 107 |
+
2. `GRPOHook` class β wraps Oracle + Ledger + Broker for online evaluation
|
| 108 |
+
3. `OfflineComparator` β compares policies using saved trajectories when training is infeasible
|
| 109 |
+
|
| 110 |
+
## 6. Benchmarks
|
| 111 |
+
|
| 112 |
+
### Benchmark 1: Code Compute Allocation
|
| 113 |
+
- Dataset: `openai/openai_humaneval` or `evalplus/humanevalplus`
|
| 114 |
+
- Baselines: fixed compute, verifier retries, OCC allocation
|
| 115 |
+
- Metrics: pass@1, pass@k, tokens used, model calls, cost, compute saved at iso-accuracy
|
| 116 |
+
|
| 117 |
+
### Benchmark 2: Retrieval QA
|
| 118 |
+
- Dataset: synthetic grounded QA + adversarial evidence
|
| 119 |
+
- Baselines: direct answer, RAG, RAG+verifier, OCC
|
| 120 |
+
- Metrics: correctness, hallucination rate, abstention utility, ECE, retrieval calls, cost
|
| 121 |
+
|
| 122 |
+
### Benchmark 3: Multi-Agent Debate
|
| 123 |
+
- Dataset: synthetic factual disputes + code debates
|
| 124 |
+
- Baselines: equal turns, majority vote, confidence-weighted, OCC
|
| 125 |
+
- Metrics: decision quality, compute used, quality per GPU-second, bad-agent containment
|
| 126 |
+
|
| 127 |
+
## 7. Ablations
|
| 128 |
+
|
| 129 |
+
1. No credit ledger (oracle score used directly)
|
| 130 |
+
2. Transferable credits
|
| 131 |
+
3. Non-decaying credits
|
| 132 |
+
4. No abstention reward
|
| 133 |
+
5. No calibration penalty
|
| 134 |
+
6. No cost penalty
|
| 135 |
+
7. No anti-gaming penalty
|
| 136 |
+
8. No broker (oracle score only)
|
| 137 |
+
9. Broker with static rules
|
| 138 |
+
10. Broker with learned/score-based rights
|
| 139 |
+
|
| 140 |
+
## 8. Anti-Gaming Tests
|
| 141 |
+
|
| 142 |
+
- Spam low-value actions
|
| 143 |
+
- Hoard credits
|
| 144 |
+
- Transfer credit indirectly
|
| 145 |
+
- Exploit weak judge
|
| 146 |
+
- Verbose but low-value debate turns
|
| 147 |
+
- Over-abstention
|
| 148 |
+
- Overuse retrieval
|
| 149 |
+
- Manipulate confidence
|
| 150 |
+
- Optimize for unit tests while breaking hidden tests
|
| 151 |
+
- Collude in multi-agent debate
|
| 152 |
+
|
| 153 |
+
Measure: gaming success rate, credit leakage, robustness under judge replacement, quality degradation, broker containment.
|