Instructions to use domyn/Domyn-Small-v1.0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use domyn/Domyn-Small-v1.0 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="domyn/Domyn-Small-v1.0")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("domyn/Domyn-Small-v1.0")
model = AutoModelForCausalLM.from_pretrained("domyn/Domyn-Small-v1.0")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use domyn/Domyn-Small-v1.0 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "domyn/Domyn-Small-v1.0"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "domyn/Domyn-Small-v1.0",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/domyn/Domyn-Small-v1.0

SGLang

How to use domyn/Domyn-Small-v1.0 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "domyn/Domyn-Small-v1.0" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "domyn/Domyn-Small-v1.0",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "domyn/Domyn-Small-v1.0" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "domyn/Domyn-Small-v1.0",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use domyn/Domyn-Small-v1.0 with Docker Model Runner:
```
docker model run hf.co/domyn/Domyn-Small-v1.0
```

`olmo3` reasoning parser crashes at startup on Domyn-Small-v1.0 tokenizer

by alescire94 - opened 2 days ago

Discussion

alescire94

2 days ago

•

edited 2 days ago

Hi! running into a startup crash when executing the model card's command using the olmo3 reasoning parser. Details below.

vLLM version: 0.21.0
Command:

uv run vllm serve domyn/Domyn-Small-v1.0 \
    --tensor-parallel-size 1 \
    --dtype bfloat16 \
    --max-model-len 32768 \
    --max-num-seqs 256 \
    --reasoning-parser olmo3

Error:

File ".../vllm/reasoning/olmo3_reasoning_parser.py", line 242, in __init__
    self.vocab[token] for token in self.think_end_first_split
KeyError: 'Ġ</'

Repro:

uv add vllm==0.21.0
Run the command above.
Server crashes at startup with the traceback shown.

iGenius-AI-Team

Domyn org 2 days ago

Hi, thanks for the report, we were able to reproduce it on our side.

It's a parser/tokenizer mismatch in vLLM 0.21's Olmo3ReasoningParser: its init does an eager lookup of GPT-2-BPE token strings ('Ġ</' etc.) in the vocab, but Domyn-Small uses a SentencePiece tokenizer where / aren't single vocab tokens — so it dies at startup before serving any request.

vLLM 0.20 didn't have this eager check, which is why it worked there.

Two quick options while we sort it out:

Pin to vLLM 0.20.0 — known good, no other change needed.
Or wait a couple of days — we'll ship a small reasoning-parser plugin (loadable via --reasoning-parser-plugin) along with usage instructions in the model card.

Will follow up here once it's published.

iGenius-AI-Team

Domyn org about 21 hours ago

Hi @alescire94 , we've just pushed the custom reasoning parser plugin.
You can find instruction on how to use it in the README.

Thank you again for flagging the issue.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment