Model card

Cookbook

Summary

This EpistemeAI model is based on GPT-OSS-20B and has been fine-tuned using the Unsloth RL framework to optimize inference efficiency while mitigating vulnerabilities such as reward hacking during reinforcement learning from human feedback (RLHF)–style training. The fine-tuning process emphasizes alignment robustness and efficiency, ensuring the model preserves its reasoning depth without incurring excessive computational overhead. The model to deliver 3x faster inference for gpt-oss-rl at ~ 21 tokens/s. For BF16, this model also achieves the fastest inference (~30 tokens/s)*

This model were trained on OpenAI's harmony response format and should only be used with the harmony format as it will not work correctly otherwise.

Highlights from gps-oss-20B:

Permissive Apache 2.0 license: Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment.
Configurable reasoning effort: Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs.
Full chain-of-thought: Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs. It’s not intended to be shown to end users.
Fine-tunable: Fully customize models to your specific use case through parameter fine-tuning.
Agentic capabilities: Use the models’ native capabilities for function calling, web browsing, Python code execution, and Structured Outputs.
MXFP4 quantization: The models were post-trained with MXFP4 quantization of the MoE weights, making gpt-oss-120b run on a single 80GB GPU (like NVIDIA H100 or AMD MI300X) and the gpt-oss-20b model run within 16GB of memory. All evals were performed with the same MXFP4 quantization.

Inference examples

Transformers

You can use 'gpt-oss-20b-rlwith Transformers. If you use the Transformers chat template, it will automatically apply the [harmony response format](https://github.com/openai/harmony). If you usemodel.generate` directly, you need to apply the harmony format manually using the chat template or use our openai-harmony package.

To get started, install the necessary dependencies to setup your environment:

pip install -U transformers kernels torch

For Google Colab (free/Pro)

!pip install -q --upgrade torch

!pip install -q transformers triton==3.4 kernels

!pip uninstall -q torchvision torchaudio -y

Once, setup you can proceed to run the model by running the snippet below:

from transformers import pipeline
import torch
model_id = "EpistemeAI/gpt-oss-20b-RL"
pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype="auto",
    device_map="auto",
)
messages = [
    {"role": "user", "content": "Explain quantum mechanics clearly and concisely."},
]
outputs = pipe(
    messages,
    max_new_tokens=3000,
)
print(outputs[0]["generated_text"][-1])

Benchmark

Tasks	Version	Filter	n-shot	Metric	Epist*gptoss-20b-rl	gpt-oss-20	DeepSeek-V3.2-Exp	GLM-4.6
gsm8k (cot)	3	flexible-extract	5	exact_match	0.956	0.78	-	-
gpqa_diamond (cot)	2	flexible-extract	5	exact_match	0.8538+	0.666	0.799	0.829
mmlu	2	none		acc	0.8528+	0.853	0.85	-
humaneval	1	create_test	0	pass@1	0.8452+	0.73	-	-
mmlu_college_biology	1	none	2	acc	1.0	1.0	-	-
mmluprox_biology	1	none	5	acc	0.8452		-	-
mmluprox_computer_science	1	none	5	acc	0.7851		-	-
Math							-	-
AIME_2025I+II	1	none	5	acc	0.9495	0.63*	0.893	0.93

Benchmark used The Language Model Evaluation Harness

*Score from Artificialanalysis.ai

Uploaded finetuned model

Developed by: EpistemeAI
License: apache-2.0
Finetuned from model : unsloth/gpt-oss-20b-unsloth-bnb-4bit

This gpt_oss model was trained 2x faster with Unsloth and Huggingface's TRL library.

Reference:

Unsloth's *gpt-oss Reinforcement Learning

Citation

@misc{openai2025gptoss120bgptoss20bmodel,
      title={gpt-oss-120b & gpt-oss-20b Model Card}, 
      author={OpenAI},
      year={2025},
      eprint={2508.10925},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2508.10925}, 
}

@misc{EpistemeAI RL,
  title={EpistemeAI RL},
  author={EpistemeAI Research},
  year={2025},
}

Downloads last month: 62

Safetensors

Model size

22B params

Tensor type

BF16

Model tree for EpistemeAI/Episteme-gptoss-20b-RL

Base model

openai/gpt-oss-20b

Quantized

unsloth/gpt-oss-20b-unsloth-bnb-4bit

Quantized

(130)

this model

Quantizations

5 models

EpistemeAI
/

Episteme-gptoss-20b-RL