Brick Complexity Classifier v2: max

What is this?

Classifier v2 is a family of small adapters that score each incoming prompt as easy / medium / hard, so a router can send it to the right tier of a model pool. Two variants optimize for different goals:

  • eco: optimized for cost. Biases predictions toward easy so most traffic stays on the cheap tier. Use when the cost-per-query bill matters more than squeezing the last accuracy point.
  • max: optimized for routing accuracy. Gives the sharpest easy/medium/hard split, so hard queries reliably reach the strongest tier and easy ones stay cheap. Use when answer quality is paramount.

Maximum-accuracy variant tuned to classify query complexity as precisely as possible. Prioritizes routing quality over cost.

Regolo.ai | Brick SR1 on GitHub

License: CC BY-NC 4.0 Base Model


Model Details

Property Value
Variant max
Target Best classification accuracy, maximum routing quality
Base model Qwen/Qwen3.5-0.8B
Adapter type LoRA (r=32, α=32, dropout=0.1)
Output classes 3 (easy, medium, hard)
License CC BY-NC 4.0

Available Formats

Usage (PEFT)

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-0.8B", torch_dtype=torch.bfloat16)
tok = AutoTokenizer.from_pretrained("Qwen/Qwen3.5-0.8B")
model = PeftModel.from_pretrained(base, "regolo/brick-complexity-2-max").eval()

system = """You are a query difficulty classifier for an LLM routing system.
Classify each query as easy, medium, or hard based on the cognitive depth and domain expertise required to answer correctly.
Respond with ONLY one word: easy, medium, or hard."""
prompt = f"<|im_start|>system\n{system}<|im_end|>\n<|im_start|>user\nClassify: Design a distributed consensus algorithm<|im_end|>\n<|im_start|>assistant\n"
ids = tok(prompt, return_tensors="pt").input_ids
out = model.generate(ids, max_new_tokens=3, do_sample=False)
print(tok.decode(out[0][ids.shape[1]:], skip_special_tokens=True).strip())
# Output: hard

Usage (vLLM)

from vllm import LLM, SamplingParams
from vllm.lora.request import LoRARequest

llm = LLM(
    model="Qwen/Qwen3.5-0.8B",
    enable_lora=True,
    max_lora_rank=32,
    dtype="bfloat16",
)
sp = SamplingParams(temperature=0, max_tokens=3)

system = """You are a query difficulty classifier for an LLM routing system.
Classify each query as easy, medium, or hard based on the cognitive depth and domain expertise required to answer correctly.
Respond with ONLY one word: easy, medium, or hard."""
prompt = f"<|im_start|>system\n{system}<|im_end|>\n<|im_start|>user\nClassify: Explain the rendering equation from radiometric first principles<|im_end|>\n<|im_start|>assistant\n"

out = llm.generate(
    [prompt],
    sp,
    lora_request=LoRARequest("brick-complexity-2-max", 1, "regolo/brick-complexity-2-max"),
)
print(out[0].outputs[0].text.strip())
# Output: hard

About Brick

Regolo.ai is the EU-sovereign LLM inference platform built on Seeweb infrastructure. Brick is our open-source semantic routing system that intelligently distributes queries across model pools, optimizing for cost, latency, and quality.

Website | Docs | GitHub | Discord

Downloads last month
45
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for regolo/brick-complexity-2-max

Adapter
(74)
this model
Quantizations
3 models

Collection including regolo/brick-complexity-2-max