Model card

Cookbook

EpistemeAI Cookbook

Summary

This EpistemeAI model is based on GPT-OSS-20B and has been fine-tuned using the Unsloth RL framework to optimize inference efficiency while mitigating vulnerabilities such as reward hacking during reinforcement learning from human feedback (RLHF)–style training. The fine-tuning process emphasizes alignment robustness and efficiency, ensuring the model preserves its reasoning depth without incurring excessive computational overhead. The model to deliver 3x faster inference for gpt-oss-rl at ~ 21 tokens/s. For BF16, this model also achieves the fastest inference (~30 tokens/s)*

  • This model were trained on OpenAI's harmony response format and should only be used with the harmony format as it will not work correctly otherwise.

Highlights from gps-oss-20B:

  • Permissive Apache 2.0 license: Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment.
  • Configurable reasoning effort: Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs.
  • Full chain-of-thought: Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs. It’s not intended to be shown to end users.
  • Fine-tunable: Fully customize models to your specific use case through parameter fine-tuning.
  • Agentic capabilities: Use the models’ native capabilities for function calling, web browsing, Python code execution, and Structured Outputs.
  • MXFP4 quantization: The models were post-trained with MXFP4 quantization of the MoE weights, making gpt-oss-120b run on a single 80GB GPU (like NVIDIA H100 or AMD MI300X) and the gpt-oss-20b model run within 16GB of memory. All evals were performed with the same MXFP4 quantization.

Inference examples

Transformers

You can use 'gpt-oss-20b-rlwith Transformers. If you use the Transformers chat template, it will automatically apply the [harmony response format](https://github.com/openai/harmony). If you usemodel.generate` directly, you need to apply the harmony format manually using the chat template or use our openai-harmony package.

To get started, install the necessary dependencies to setup your environment:

pip install -U transformers kernels torch 

For Google Colab (free/Pro)

!pip install -q --upgrade torch

!pip install -q transformers triton==3.4 kernels

!pip uninstall -q torchvision torchaudio -y

Once, setup you can proceed to run the model by running the snippet below:

from transformers import pipeline
import torch
model_id = "EpistemeAI/gpt-oss-20b-RL"
pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype="auto",
    device_map="auto",
)
messages = [
    {"role": "user", "content": "Explain quantum mechanics clearly and concisely."},
]
outputs = pipe(
    messages,
    max_new_tokens=3000,
)
print(outputs[0]["generated_text"][-1])

Benchmark

Tasks Version Filter n-shot Metric Epist*gptoss-20b-rl gpt-oss-20 DeepSeek-V3.2-Exp GLM-4.6
gsm8k (cot) 3 flexible-extract 5 exact_match 0.956 0.78 - -
gpqa_diamond (cot) 2 flexible-extract 5 exact_match 0.8538+ 0.666 0.799 0.829
mmlu 2 none acc 0.8528+ 0.853 0.85 -
humaneval 1 create_test 0 pass@1 0.8452+ 0.73 - -
mmlu_college_biology 1 none 2 acc 1.0 1.0 - -
mmluprox_biology 1 none 5 acc 0.8452 - -
mmluprox_computer_science 1 none 5 acc 0.7851 - -
Math - -
AIME_2025I+II 1 none 5 acc 0.9495 0.63* 0.893 0.93

Uploaded finetuned model

  • Developed by: EpistemeAI
  • License: apache-2.0
  • Finetuned from model : unsloth/gpt-oss-20b-unsloth-bnb-4bit

This gpt_oss model was trained 2x faster with Unsloth and Huggingface's TRL library.

Reference:

Citation

@misc{openai2025gptoss120bgptoss20bmodel,
      title={gpt-oss-120b & gpt-oss-20b Model Card}, 
      author={OpenAI},
      year={2025},
      eprint={2508.10925},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2508.10925}, 
}
@misc{EpistemeAI RL,
  title={EpistemeAI RL},
  author={EpistemeAI Research},
  year={2025},
}
Downloads last month
62
Safetensors
Model size
22B params
Tensor type
BF16
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for EpistemeAI/Episteme-gptoss-20b-RL

Base model

openai/gpt-oss-20b
Quantized
(130)
this model
Quantizations
5 models

Spaces using EpistemeAI/Episteme-gptoss-20b-RL 2