Model card
Cookbook
Summary
This EpistemeAI model is based on GPT-OSS-20B and has been fine-tuned using the Unsloth RL framework to optimize inference efficiency while mitigating vulnerabilities such as reward hacking during reinforcement learning from human feedback (RLHF)–style training. The fine-tuning process emphasizes alignment robustness and efficiency, ensuring the model preserves its reasoning depth without incurring excessive computational overhead. The model to deliver 3x faster inference for gpt-oss-rl at ~ 21 tokens/s. For BF16, this model also achieves the fastest inference (~30 tokens/s)*
- This model were trained on OpenAI's harmony response format and should only be used with the harmony format as it will not work correctly otherwise.
Highlights from gps-oss-20B:
- Permissive Apache 2.0 license: Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment.
- Configurable reasoning effort: Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs.
- Full chain-of-thought: Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs. It’s not intended to be shown to end users.
- Fine-tunable: Fully customize models to your specific use case through parameter fine-tuning.
- Agentic capabilities: Use the models’ native capabilities for function calling, web browsing, Python code execution, and Structured Outputs.
- MXFP4 quantization: The models were post-trained with MXFP4 quantization of the MoE weights, making
gpt-oss-120brun on a single 80GB GPU (like NVIDIA H100 or AMD MI300X) and thegpt-oss-20bmodel run within 16GB of memory. All evals were performed with the same MXFP4 quantization.
Inference examples
Transformers
You can use 'gpt-oss-20b-rlwith Transformers. If you use the Transformers chat template, it will automatically apply the [harmony response format](https://github.com/openai/harmony). If you usemodel.generate` directly, you need to apply the harmony format manually using the chat template or use our openai-harmony package.
To get started, install the necessary dependencies to setup your environment:
pip install -U transformers kernels torch
For Google Colab (free/Pro)
!pip install -q --upgrade torch
!pip install -q transformers triton==3.4 kernels
!pip uninstall -q torchvision torchaudio -y
Once, setup you can proceed to run the model by running the snippet below:
from transformers import pipeline
import torch
model_id = "EpistemeAI/gpt-oss-20b-RL"
pipe = pipeline(
"text-generation",
model=model_id,
torch_dtype="auto",
device_map="auto",
)
messages = [
{"role": "user", "content": "Explain quantum mechanics clearly and concisely."},
]
outputs = pipe(
messages,
max_new_tokens=3000,
)
print(outputs[0]["generated_text"][-1])
Benchmark
| Tasks | Version | Filter | n-shot | Metric | Epist*gptoss-20b-rl | gpt-oss-20 | DeepSeek-V3.2-Exp | GLM-4.6 |
|---|---|---|---|---|---|---|---|---|
| gsm8k (cot) | 3 | flexible-extract | 5 | exact_match | 0.956 | 0.78 | - | - |
| gpqa_diamond (cot) | 2 | flexible-extract | 5 | exact_match | 0.8538+ | 0.666 | 0.799 | 0.829 |
| mmlu | 2 | none | acc | 0.8528+ | 0.853 | 0.85 | - | |
| humaneval | 1 | create_test | 0 | pass@1 | 0.8452+ | 0.73 | - | - |
| mmlu_college_biology | 1 | none | 2 | acc | 1.0 | 1.0 | - | - |
| mmluprox_biology | 1 | none | 5 | acc | 0.8452 | - | - | |
| mmluprox_computer_science | 1 | none | 5 | acc | 0.7851 | - | - | |
| Math | - | - | ||||||
| AIME_2025I+II | 1 | none | 5 | acc | 0.9495 | 0.63* | 0.893 | 0.93 |
- Benchmark used The Language Model Evaluation Harness
- *Score from Artificialanalysis.ai
Uploaded finetuned model
- Developed by: EpistemeAI
- License: apache-2.0
- Finetuned from model : unsloth/gpt-oss-20b-unsloth-bnb-4bit
This gpt_oss model was trained 2x faster with Unsloth and Huggingface's TRL library.
Reference:
- Unsloth's *gpt-oss Reinforcement Learning
Citation
@misc{openai2025gptoss120bgptoss20bmodel,
title={gpt-oss-120b & gpt-oss-20b Model Card},
author={OpenAI},
year={2025},
eprint={2508.10925},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2508.10925},
}
@misc{EpistemeAI RL,
title={EpistemeAI RL},
author={EpistemeAI Research},
year={2025},
}
- Downloads last month
- 62
