ContextRL-Klear-AgentForge-8B

This is the agentic (long-horizon) model released with the paper Context-Aware RL for Agentic and Multimodal LLMs. It is fine-tuned from Klear-AgentForge-8B, a model specialized for complex agentic coding, using ContextRL, a context-aware reinforcement learning method that augments standard GRPO with an auxiliary context-selection objective to improve fine-grained context grounding in long-horizon agent trajectories.

Results

Across 5 long-horizon benchmarks (2 in-distribution agentic coding, 3 out-of-distribution), ContextRL improves over the standard GRPO baseline by +3.2 points on average, while improving every individual benchmark.

Benchmark Base RL (GRPO) ContextRL (Ours)
SWE-Bench Verified 26.6 28.0 30.2
SWE-Bench Lite 21.0 21.7 24.0
LiveCodeBench v6 21.7 22.3 24.0
LongBench v2 (Overall) 27.4 27.0 29.6
LongBench v2 (Long) 21.3 24.1 28.7
NIAH 68.3 65.5 71.3

Metrics: SWE-Bench Verified/Lite resolve rate (%), LiveCodeBench v6 solve rate (%), LongBench v2 accuracy (%), NIAH mean recall (%). On the long-context tasks (LongBench v2, NIAH) where standard outcome-based GRPO struggles or regresses, ContextRL surpasses both the base model and the RL baseline, demonstrating strong out-of-distribution generalization.

Usage

This model follows the same interface as its Klear-AgentForge-8B base and can be loaded with transformers. Training and evaluation code, data construction pipelines, and detailed configurations are available in the repository: 👉 https://github.com/xupy2003/ContextAwareRL Please refer to the repo's README for environment setup, inference scripts, and reproduction instructions.

Downloads last month
16
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including xupy21/ContextRL_Klear_AgentForge_8B