Upload README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,48 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Oracle-Credit-Compute (OCC) Stack
|
| 2 |
+
|
| 3 |
+
A minimal, open-source system for **agentic compute allocation** via verified marginal impact.
|
| 4 |
+
|
| 5 |
+
## Core Thesis
|
| 6 |
+
|
| 7 |
+
Modern agent systems waste test-time compute because every agent, tool call, debate turn, or verifier pass consumes resources without proving marginal value. OCC allocates compute, retrieval, write privileges, and debate bandwidth toward actions that measurably improve task outcomes.
|
| 8 |
+
|
| 9 |
+
## Components
|
| 10 |
+
|
| 11 |
+
| Component | Purpose |
|
| 12 |
+
|-----------|---------|
|
| 13 |
+
| `oracle/` | Impact Oracle — scores whether an action produced measurable marginal value |
|
| 14 |
+
| `ledger/` | Credit Ledger — non-transferable, decaying credits based on verified impact |
|
| 15 |
+
| `broker/` | Resource Broker — capability-based rights based on credits, task state, and risk |
|
| 16 |
+
| `rl/` | GRPO-compatible reward hook using the Oracle as reward |
|
| 17 |
+
| `benchmarks/` | Tight, verifiable benchmarks: code, retrieval QA, multi-agent debate |
|
| 18 |
+
| `configs/` | Experiment configurations |
|
| 19 |
+
| `reports/` | Results, ablations, anti-gaming tests |
|
| 20 |
+
|
| 21 |
+
## Quick Start
|
| 22 |
+
|
| 23 |
+
```bash
|
| 24 |
+
pip install -r requirements.txt
|
| 25 |
+
python -m benchmarks.benchmark_code # Code compute allocation
|
| 26 |
+
python -m benchmarks.benchmark_retrieval_qa # Retrieval QA
|
| 27 |
+
python -m benchmarks.benchmark_debate # Multi-agent debate
|
| 28 |
+
python -m eval_runner # Run all ablations
|
| 29 |
+
```
|
| 30 |
+
|
| 31 |
+
## Design
|
| 32 |
+
|
| 33 |
+
See [design.md](design.md) for architecture, reward formulas, and anti-gaming mechanisms.
|
| 34 |
+
|
| 35 |
+
## Literature Review
|
| 36 |
+
|
| 37 |
+
See [reports/literature_review.md](reports/literature_review.md) for prior work analysis.
|
| 38 |
+
|
| 39 |
+
## Results Summary
|
| 40 |
+
|
| 41 |
+
- **Code compute allocation**: OCC achieves **66.8% compute reduction** at iso- or higher accuracy versus fixed-budget baseline.
|
| 42 |
+
- **Retrieval QA**: OCC shows lower confident-wrong rates and smart retrieval stopping.
|
| 43 |
+
- **Multi-agent debate**: OCC matches equal-turns accuracy with 12.4% less compute.
|
| 44 |
+
- **Anti-gaming**: Spam, hidden-test gaming, and over-abstention are all contained.
|
| 45 |
+
|
| 46 |
+
## License
|
| 47 |
+
|
| 48 |
+
MIT
|