Upload design.md
Browse files
design.md
CHANGED
|
@@ -1,132 +1,286 @@
|
|
| 1 |
-
# OCC
|
| 2 |
|
| 3 |
-
##
|
| 4 |
|
| 5 |
-
Compute is
|
| 6 |
|
| 7 |
-
## Core
|
| 8 |
|
| 9 |
-
|
| 10 |
|
| 11 |
-
|
| 12 |
-
1. **Oracle:** Score the marginal impact of an action
|
| 13 |
-
2. **Credit:** Update the agent's credit balance based on that score
|
| 14 |
-
3. **Compute:** The broker decides whether to grant the requested resource
|
| 15 |
|
| 16 |
-
##
|
| 17 |
|
| 18 |
-
###
|
| 19 |
|
| 20 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
|
| 22 |
-
|
| 23 |
|
| 24 |
-
|
| 25 |
-
- **Cost-adjusted scores:** Every score subtracts a compute cost penalty. This prevents agents from achieving correctness through brute-force token spending.
|
| 26 |
-
- **Proper scoring rules:** Calibration bonus via Brier score encourages well-calibrated confidence, not just correctness.
|
| 27 |
-
- **Anti-gaming detectors:** Explicit checks for hidden-test gaming, spam, collusion, and over-abstention.
|
| 28 |
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
- **Non-transferable:** `transfer()` always returns `False`. This prevents colluding agents from pooling credits or laundering them through intermediaries.
|
| 34 |
-
- **Exponential decay:** Idle credits decay at rate Ξ» per time step. This prevents hoarding and encourages agents to use credits or lose them.
|
| 35 |
-
- **Capability-scoped:** Credits are scoped to specific capabilities (`retrieval`, `model_call`, `file_write`). An agent that is good at retrieval should not automatically get dangerous write permissions.
|
| 36 |
-
- **Full provenance:** Every entry has an oracle score, compute cost, timestamp, and reason. This enables auditing and debugging.
|
| 37 |
|
| 38 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
|
| 40 |
-
|
| 41 |
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
- **Medium:** `model_call`, `verifier_call`, `memory_write` β threshold 2.0 credits
|
| 45 |
-
- **High:** `file_write`, `shell_execute`, `human_escalation` β threshold 5.0 credits, may require approval
|
| 46 |
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
- `DENY`: credits < threshold Γ 0.5
|
| 50 |
-
- `REQUIRE_APPROVAL`: high-risk + high risk score
|
| 51 |
-
- `DOWNGRADE`: credits between 0.5Γ and 1.0Γ threshold β downgrade to cheaper resource
|
| 52 |
-
- `ESCALATE`: repeated denials from same agent
|
| 53 |
-
- `ASK_JUSTIFICATION`: credits insufficient but agent has some history
|
| 54 |
|
| 55 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 56 |
|
| 57 |
-
|
| 58 |
|
| 59 |
-
|
|
|
|
| 60 |
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
|
|
|
| 65 |
|
| 66 |
-
##
|
| 67 |
|
| 68 |
```
|
| 69 |
-
|
| 70 |
-
verified_task_score
|
| 71 |
-
+ abstention_utility
|
| 72 |
-
+ calibration_bonus
|
| 73 |
-
- hallucination_penalty
|
| 74 |
-
- confident_wrong_penalty
|
| 75 |
-
- compute_cost_penalty
|
| 76 |
-
- gaming_penalty
|
| 77 |
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
calibration_bonus = (1 - brier_score) * weight_calibration
|
| 82 |
-
brier_score = (confidence - outcome)^2
|
| 83 |
-
hallucination_penalty = 2.0 if entailment < 0.5 and contradiction > 0.5
|
| 84 |
-
confident_wrong_penalty = 3.0 if confidence > 0.8 and correctness < 0.5
|
| 85 |
-
compute_cost_penalty = compute_cost * 0.0001
|
| 86 |
-
gaming_penalty = 2.0 if hidden_tests fail while public pass
|
| 87 |
```
|
| 88 |
|
| 89 |
-
##
|
| 90 |
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
- Containment: Oracle subtracts gaming_penalty. Ledger can revoke all credits on explicit detection.
|
| 94 |
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
|
|
|
|
|
|
|
| 98 |
|
| 99 |
-
###
|
| 100 |
-
- Prevention: `transfer()` returns `False` unconditionally.
|
| 101 |
|
| 102 |
-
|
| 103 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 104 |
|
| 105 |
-
###
|
| 106 |
-
- Detection: Agent abstains on answerable questions.
|
| 107 |
-
- Containment: Wrong abstentions get -abstention_bonus (-1.0).
|
| 108 |
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 112 |
|
| 113 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 114 |
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
|
| 118 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 119 |
|
| 120 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 121 |
|
| 122 |
-
|
| 123 |
-
|
| 124 |
-
|
| 125 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 126 |
|
| 127 |
-
|
| 128 |
|
| 129 |
-
|
| 130 |
-
2. **Dynamic thresholds:** Learn thresholds from historical data rather than hardcoding.
|
| 131 |
-
3. **Peer review:** Multiple oracles vote on controversial actions.
|
| 132 |
-
4. **Human-in-the-loop:** Escalate high-risk decisions to human reviewers with credit incentives.
|
|
|
|
| 1 |
+
# OCC: Formal System Definition
|
| 2 |
|
| 3 |
+
## Overview
|
| 4 |
|
| 5 |
+
OCC (Oracle-Credit-Compute) is a mechanism-design layer that governs agent access to compute, retrieval, debate turns, tool execution, and other resources. It treats compute allocation as a security boundary rather than a performance optimization.
|
| 6 |
|
| 7 |
+
## Core Insight
|
| 8 |
|
| 9 |
+
In multi-agent systems, compute is not neutral. Extra turns, tokens, and tool calls can amplify adversarial influence unless access to deliberation is governed by verified marginal contribution. OCC makes agent compute scarce, earned, scoped, decaying, and auditable.
|
| 10 |
|
| 11 |
+
---
|
|
|
|
|
|
|
|
|
|
| 12 |
|
| 13 |
+
## Formal Definition
|
| 14 |
|
| 15 |
+
### Entities
|
| 16 |
|
| 17 |
+
Let:
|
| 18 |
+
- **A** = {aβ, aβ, ..., aβ} be a set of agents
|
| 19 |
+
- **T** = {tβ, tβ, ..., tβ} be a set of tasks
|
| 20 |
+
- **R** = {rβ, rβ, ..., rβ} be a set of resource types (model calls, retrieval, debate turns, tool execution, file writes, etc.)
|
| 21 |
+
- **C** = {cβ, cβ, ..., cβ} be a set of capability scopes
|
| 22 |
+
- **O** be an Impact Oracle that maps (action, context, outcome) β score β [β1, 1]
|
| 23 |
|
| 24 |
+
### Credit State
|
| 25 |
|
| 26 |
+
Each agent a has a credit vector at time step t:
|
|
|
|
|
|
|
|
|
|
| 27 |
|
| 28 |
+
```
|
| 29 |
+
credit[a, t] β ββ (non-negative real)
|
| 30 |
+
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
|
| 32 |
+
Credits are:
|
| 33 |
+
- **Non-transferable**: βa,b β A, aβ b, credit[b,t] cannot increase from credit[a,t]
|
| 34 |
+
- **Decaying**: credit[a, t+1] = decay(credit[a,t]) where decay(x) = x Β· Ξ΄, Ξ΄ β (0,1)
|
| 35 |
+
- **Task-scoped**: credits can be bound to a specific task Ο
|
| 36 |
+
- **Capability-scoped**: credits can be earmarked for capability scope c
|
| 37 |
|
| 38 |
+
### Earning Function
|
| 39 |
|
| 40 |
+
```
|
| 41 |
+
earn(a, action, oracle_score, compute_cost) β Ξ β β
|
|
|
|
|
|
|
| 42 |
|
| 43 |
+
Ξ = f(oracle_score, compute_cost, calibration, abstention_utility)
|
| 44 |
+
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 45 |
|
| 46 |
+
Where f must satisfy:
|
| 47 |
+
- oracle_score < 0 β Ξ β€ 0 (negative contribution yields β€ 0 credit)
|
| 48 |
+
- oracle_score = 0 β Ξ = 0 (neutral action neither earns nor loses)
|
| 49 |
+
- oracle_score > 0 β Ξ > 0 (positive contribution earns credit)
|
| 50 |
+
- compute_cost > 0 reduces Ξ proportionally
|
| 51 |
+
- calibration_error > threshold reduces Ξ
|
| 52 |
+
- confident_wrong action (high confidence + oracle_score < 0) β Ξ < 0 (penalty)
|
| 53 |
|
| 54 |
+
### Spend Function
|
| 55 |
|
| 56 |
+
```
|
| 57 |
+
spend(a, resource_type, capability_scope) β {allow, deny, downgrade, escalate, require_approval}
|
| 58 |
|
| 59 |
+
allow if: credit[a,t] β₯ cost(resource_type, capability_scope)
|
| 60 |
+
AND a has capability_scope_policy[scope]
|
| 61 |
+
AND credit_decay_rate[a] β€ max_decay
|
| 62 |
+
AND gaming_score[a] β€ gaming_threshold
|
| 63 |
+
```
|
| 64 |
|
| 65 |
+
### Decay Schedule
|
| 66 |
|
| 67 |
```
|
| 68 |
+
decay(credit[t]) = credit[t] Β· Ξ΄
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 69 |
|
| 70 |
+
where:
|
| 71 |
+
Ξ΄ = 0.995 (per-turn decay, ~5% per 10 turns)
|
| 72 |
+
Or task-scoped: Ξ΄ = 1.0 until task completion, then Ξ΄ = 0.0 (credits expire)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 73 |
```
|
| 74 |
|
| 75 |
+
### Credit Caps
|
| 76 |
|
| 77 |
+
```
|
| 78 |
+
credit[a,t] β€ credit_cap(capability_scope)
|
|
|
|
| 79 |
|
| 80 |
+
credit_cap translates to maximum resource access:
|
| 81 |
+
Model calls: credit_cap / cost_per_call
|
| 82 |
+
Retrieval calls: credit_cap / cost_per_retrieval
|
| 83 |
+
Debate turns: credit_cap / cost_per_turn
|
| 84 |
+
```
|
| 85 |
|
| 86 |
+
### Oracle Scoring
|
|
|
|
| 87 |
|
| 88 |
+
```
|
| 89 |
+
oracle_score = Ξ±β Β· correctness(a, t, outcome)
|
| 90 |
+
+ Ξ±β Β· evidence_support(a, t, evidence)
|
| 91 |
+
+ Ξ±β Β· improvement_over_prior(a, t, prior_state)
|
| 92 |
+
+ Ξ±β Β· calibration(a, t, prediction, outcome)
|
| 93 |
+
+ Ξ±β
Β· abstention_utility(a, t, decision_to_abstain)
|
| 94 |
+
β Ξ²β Β· hallucination(a, t, evidence)
|
| 95 |
+
β Ξ²β Β· confident_wrong(a, t, prediction, outcome, confidence)
|
| 96 |
+
β Ξ²β Β· wasteful_compute(a, t, compute_used, value_produced)
|
| 97 |
+
β Ξ²β Β· gaming_suspicion(a, t, action_pattern)
|
| 98 |
+
|
| 99 |
+
where:
|
| 100 |
+
correctness: 1 if correct, 0 if incorrect, β1 if harmful
|
| 101 |
+
evidence_support: 1 if evidence fully supports, 0 if neutral, β1 if contradicts
|
| 102 |
+
improvement: + if better than prior, 0 if same, β if worse
|
| 103 |
+
calibration: + if well-calibrated, β if overconfident
|
| 104 |
+
abstention_utility: + if abstaining was correct, β if it was evasive but answerable
|
| 105 |
+
hallucination: β if generated claim contradicts evidence
|
| 106 |
+
confident_wrong: β if high confidence AND incorrect (larger penalty than regular wrong)
|
| 107 |
+
wasteful_compute: β if compute used β« value produced
|
| 108 |
+
gaming_suspicion: β if action pattern matches known gaming signatures
|
| 109 |
+
|
| 110 |
+
Default weights (tunable):
|
| 111 |
+
Ξ± = [0.30, 0.15, 0.10, 0.10, 0.15]
|
| 112 |
+
Ξ² = [0.20, 0.25, 0.15, 0.20]
|
| 113 |
+
```
|
| 114 |
|
| 115 |
+
### Reward Function (for RL/GRPO)
|
|
|
|
|
|
|
| 116 |
|
| 117 |
+
```
|
| 118 |
+
reward(a, action, context, outcome) =
|
| 119 |
+
oracle_score(a, action, context, outcome)
|
| 120 |
+
+ abstention_utility
|
| 121 |
+
+ calibration_bonus
|
| 122 |
+
β hallucination_penalty
|
| 123 |
+
β confident_wrong_penalty
|
| 124 |
+
β compute_cost Β· cost_multiplier
|
| 125 |
+
β gaming_penalty(a, history)
|
| 126 |
+
|
| 127 |
+
Constrained to [β1, 1].
|
| 128 |
+
```
|
| 129 |
|
| 130 |
+
---
|
| 131 |
+
|
| 132 |
+
## System Invariants
|
| 133 |
+
|
| 134 |
+
1. **Non-transferability**: βa,b β A, aβ b: Ξcredit[b] from a's action = 0
|
| 135 |
+
2. **Positive decay**: βa: credit[a, t+1] β€ credit[a, t] unless earned
|
| 136 |
+
3. **Capability scoping**: access(r) requires scope_policy[r] AND credit β₯ cost(r)
|
| 137 |
+
4. **External verification**: oracle_score depends only on oracle O, not on a
|
| 138 |
+
5. **Append-only ledger**: credit events are immutable once recorded
|
| 139 |
+
6. **Oracle separation**: spending agent cannot directly influence oracle O
|
| 140 |
+
7. **Negative contribution**: oracle_score < 0 β Ξ β€ 0
|
| 141 |
+
8. **Credit β identity trust**: high credit does not imply trusted access to all resources
|
| 142 |
+
9. **Reversal possible**: credit can be retroactively reduced on new evidence
|
| 143 |
+
10. **Bounded credit**: credit[a,t] β€ credit_cap(scope) always
|
| 144 |
+
|
| 145 |
+
---
|
| 146 |
+
|
| 147 |
+
## Ledger Event Schema
|
| 148 |
+
|
| 149 |
+
Every credit mutation produces an immutable event:
|
| 150 |
+
|
| 151 |
+
| Event | Fields |
|
| 152 |
+
|-------|--------|
|
| 153 |
+
| CREDIT_GRANTED | agent_id, amount, reason, oracle_score, task_id, timestamp |
|
| 154 |
+
| CREDIT_DECAYED | agent_id, amount_decayed, new_balance, timestamp |
|
| 155 |
+
| CREDIT_SPENT | agent_id, amount, resource_type, capability_scope, task_id, timestamp |
|
| 156 |
+
| TURN_DENIED | agent_id, reason (insufficient_credit/wrong_scope/gaming_threshold), timestamp |
|
| 157 |
+
| ORACLE_SCORE_RECORDED | agent_id, action_id, score, confidence, evidence_ref, timestamp |
|
| 158 |
+
| CAPABILITY_SCOPE_CHANGED | agent_id, old_scope, new_scope, reason, timestamp |
|
| 159 |
+
| AGENT_PENALIZED | agent_id, penalty_amount, reason, evidence, timestamp |
|
| 160 |
+
| VERIFICATION_REVERSED | original_event_hash, new_score, reason, timestamp |
|
| 161 |
+
| POOL_EXHAUSTED | task_id, remaining_credit, timestamp |
|
| 162 |
+
| POLICY_UPDATED | parameter_changes, reason, timestamp |
|
| 163 |
+
|
| 164 |
+
Each event includes:
|
| 165 |
+
- event_hash: SHA-256 of (previous_event_hash + event_data)
|
| 166 |
+
- parent_event_hash: chain to previous event
|
| 167 |
+
- agent_id
|
| 168 |
+
- task_id
|
| 169 |
+
- timestamp (UTC ISO 8601)
|
| 170 |
+
- capability_scope
|
| 171 |
+
- oracle_id
|
| 172 |
+
- score (if applicable)
|
| 173 |
+
- credit_delta
|
| 174 |
+
- reason (human-readable)
|
| 175 |
+
- evidence_pointer (URI or hash to evidence)
|
| 176 |
+
|
| 177 |
+
---
|
| 178 |
+
|
| 179 |
+
## Resource Broker Decision Model
|
| 180 |
+
|
| 181 |
+
For each request (agent a, resource r, scope c):
|
| 182 |
|
| 183 |
+
```
|
| 184 |
+
function decide(a, r, c):
|
| 185 |
+
if not has_scope(a, c):
|
| 186 |
+
return DENY(reason="missing capability scope")
|
| 187 |
+
|
| 188 |
+
if credit[a] < cost(r, c):
|
| 189 |
+
if credit[a] >= cost(downgraded(r), c):
|
| 190 |
+
return DOWNGRADE(alternative=downgraded(r), reason="insufficient credit for requested tier")
|
| 191 |
+
return DENY(reason="insufficient credit")
|
| 192 |
+
|
| 193 |
+
if gaming_score[a] > GAMING_THRESHOLD:
|
| 194 |
+
return REQUIRE_APPROVAL(reason="gaming suspicion")
|
| 195 |
+
|
| 196 |
+
if risk(r, a, c) > RISK_THRESHOLD:
|
| 197 |
+
return REQUIRE_APPROVAL(reason="high-risk action")
|
| 198 |
+
|
| 199 |
+
if credit[a] < cost(r, c) * 2: # running low
|
| 200 |
+
return ALLOW_WITH_WARNING(reason="low credit warning")
|
| 201 |
+
|
| 202 |
+
return ALLOW
|
| 203 |
+
```
|
| 204 |
|
| 205 |
+
### Resource Types and Costs
|
| 206 |
+
|
| 207 |
+
| Resource | Base Cost | Capability Scope |
|
| 208 |
+
|----------|-----------|-----------------|
|
| 209 |
+
| model_call_small | 1 | basic_inference |
|
| 210 |
+
| model_call_large | 5 | premium_inference |
|
| 211 |
+
| retrieval_call | 2 | retrieval |
|
| 212 |
+
| verifier_call | 3 | verification |
|
| 213 |
+
| debate_turn | 3 | deliberation |
|
| 214 |
+
| file_write | 5 | tool_execution |
|
| 215 |
+
| shell_exec | 8 | tool_execution |
|
| 216 |
+
| memory_write | 2 | memory |
|
| 217 |
+
| human_escalation | 20 | escalation |
|
| 218 |
+
|
| 219 |
+
---
|
| 220 |
+
|
| 221 |
+
## When To Use OCC
|
| 222 |
+
|
| 223 |
+
| OCC is valuable when | OCC is overkill when |
|
| 224 |
+
|---------------------|---------------------|
|
| 225 |
+
| Agents have heterogeneous reliability | Single-agent tasks suffice |
|
| 226 |
+
| Long-running tasks need budget discipline | Ground truth is immediate and cheap |
|
| 227 |
+
| Debate/collaboration can be poisoned | Adversarial participation is impossible |
|
| 228 |
+
| Compute is expensive | All agents have equal trust and capability |
|
| 229 |
+
| Auditability matters | Task budget is tiny (a few calls) |
|
| 230 |
+
| Agents can earn durable authority | Latency matters more than robustness |
|
| 231 |
+
| Post-hoc accountability required | Verifier/oracle cost exceeds saved compute |
|
| 232 |
+
| Agents can game naive allocation | There are no bad actors in the system |
|
| 233 |
+
|
| 234 |
+
---
|
| 235 |
+
|
| 236 |
+
## Threat Model
|
| 237 |
+
|
| 238 |
+
| Attack | What Adversary Controls | Success Condition | OCC Defense | Residual Risk |
|
| 239 |
+
|--------|------------------------|-------------------|-------------|---------------|
|
| 240 |
+
| Credit farming | Task selection | Accumulates budget via easy tasks | Decay + credit caps | Slow gaming over many cheap tasks |
|
| 241 |
+
| Collusion | Multiple agent identities | Transfers influence between agents | Non-transferability | Vote-ring behavior (same answer) |
|
| 242 |
+
| Oracle spoofing | Persuasive but wrong answers | Earns false credit | Verifier separation from spender | Judge hacking via prompt injection |
|
| 243 |
+
| Griefing | Burns others' budget | Lowers group accuracy | Capability-scoped spend | Indirect poisoning via bad data |
|
| 244 |
+
| Sandbagging | Hides competence | Manipulates future allocation | Decay + exploration bonus | Hard to detect without history |
|
| 245 |
+
| Identity laundering | Resets agent identity | Escapes penalties | Identity binding to account | Account churn (rate-limited) |
|
| 246 |
+
| Sybil agents | Many weak agents | Captures compute pool | Admission control | Deployment-specific new-account policy |
|
| 247 |
+
| Strategic abstention | Avoids penalties | Hoards credit | Reward shaping for participation | Conservatism bias |
|
| 248 |
+
| Verbosity gaming | Produces long but vacuous responses | Appears high-quality to weak oracle | Token-cost multiplier in reward | Requires quality oracle |
|
| 249 |
+
| Confidence manipulation | Overstates certainty | Earns calibration bonus deceptively | Proper scoring rules | Hard to calibrate perfectly |
|
| 250 |
+
|
| 251 |
+
---
|
| 252 |
+
|
| 253 |
+
## Relationship to Prior Work
|
| 254 |
+
|
| 255 |
+
OCC builds on:
|
| 256 |
+
- **AI safety debate** (Irving, Christiano, Amodei 2018): Debate as a mechanism for surfacing truth. OCC adds: debate turns are not free speech β they are auditable compute privileges.
|
| 257 |
+
- **GRPO/RLVR** (Shazeer et al. 2024): Group-relative policy optimization. OCC provides the reward function that makes GRPO converge to allocation policies.
|
| 258 |
+
- **Proper scoring rules**: OCC's calibration and abstention rewards are proper scoring rule implementations.
|
| 259 |
+
- **Capability-based security**: OCC's broker follows OS capability-system principles applied to agent API access.
|
| 260 |
+
|
| 261 |
+
OCC departs from:
|
| 262 |
+
- **Budget-aware reasoning** (e.g., token-budget RL): OCC is not about *minimizing* compute β it's about *governing* compute access.
|
| 263 |
+
- **Adaptive inference** (early exit, cascade): OCC governs *who* gets compute, not *when* to stop computing.
|
| 264 |
+
- **Multi-agent debate for accuracy**: OCC does not claim debate improves accuracy. It claims debate *without allocation control* amplifies adversarial influence.
|
| 265 |
+
|
| 266 |
+
---
|
| 267 |
+
|
| 268 |
+
## Implementation Reference
|
| 269 |
+
|
| 270 |
+
Python package at: https://huggingface.co/narcolepticchicken/occ-stack
|
| 271 |
|
| 272 |
+
```
|
| 273 |
+
/occ
|
| 274 |
+
/oracle β oracle.py (Impact Oracle: scoring, marginal impact, proper scoring)
|
| 275 |
+
/ledger β ledger.py (Credit Ledger: non-transferable, decaying, scoped credits)
|
| 276 |
+
/broker β broker.py (Resource Broker: capability-based access control)
|
| 277 |
+
/rl β reward.py (Reward function combining oracle + anti-gaming)
|
| 278 |
+
β grpo_hook.py (TRL GRPOTrainer integration)
|
| 279 |
+
/benchmarks β benchmark_debate.py, benchmark_code.py, benchmark_retrieval_qa.py
|
| 280 |
+
/configs β YAML configurations for experiments
|
| 281 |
+
/reports β results, analysis, final report
|
| 282 |
+
```
|
| 283 |
|
| 284 |
+
---
|
| 285 |
|
| 286 |
+
*Last updated: May 8, 2026. Version: 1.0.*
|
|
|
|
|
|
|
|
|