Instructions to use RAS1981/Qwen3-4B-outreach-stage4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use RAS1981/Qwen3-4B-outreach-stage4 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="RAS1981/Qwen3-4B-outreach-stage4")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("RAS1981/Qwen3-4B-outreach-stage4")
model = AutoModelForCausalLM.from_pretrained("RAS1981/Qwen3-4B-outreach-stage4")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use RAS1981/Qwen3-4B-outreach-stage4 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "RAS1981/Qwen3-4B-outreach-stage4"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RAS1981/Qwen3-4B-outreach-stage4",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/RAS1981/Qwen3-4B-outreach-stage4

SGLang

How to use RAS1981/Qwen3-4B-outreach-stage4 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "RAS1981/Qwen3-4B-outreach-stage4" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RAS1981/Qwen3-4B-outreach-stage4",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "RAS1981/Qwen3-4B-outreach-stage4" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RAS1981/Qwen3-4B-outreach-stage4",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio new

How to use RAS1981/Qwen3-4B-outreach-stage4 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for RAS1981/Qwen3-4B-outreach-stage4 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for RAS1981/Qwen3-4B-outreach-stage4 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for RAS1981/Qwen3-4B-outreach-stage4 to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="RAS1981/Qwen3-4B-outreach-stage4",
    max_seq_length=2048,
)

Docker Model Runner
How to use RAS1981/Qwen3-4B-outreach-stage4 with Docker Model Runner:
```
docker model run hf.co/RAS1981/Qwen3-4B-outreach-stage4
```

Qwen3-4B Outreach Agent — Prompt-Internalized (Stage 4)

RAS1981/Qwen3-4B-outreach-stage4

A compact Russian real-estate operator model trained through a 5-stage curriculum (CPT → S1 → S2 → S3 → S4) to fully internalize long, complex system prompts. This final Stage-4 version requires zero prompt scaffolding at inference and delivers fast TTFT, stable multi-turn reasoning, and consistent sales-oriented behavior.

Model Description

Stage 4 is the final distilled checkpoint in a progressive prompt-internalization pipeline. The model acts as a Russian real-estate qualification agent, trained to:

Greet users, set conversation tone, and collect key parameters (район, бюджет, сроки).
Handle noisy/fragmented queries, objections, misclicks, corrections.
Maintain long, multi-turn conversation state internally (no system prompt needed).
Direct the client toward booking a call, meeting, or viewing.
Stay within business rules and safely decline out-of-scope topics.
Produce structured, concise operator-style messages (bullet points, quick summaries).

The model has been aligned across 4 SFT stages and 1 domain-pretrain stage to operate fully autonomously.

Training Stages Overview

(No hyperparameters disclosed — only conceptual behavior.)

Stage 0 — Continued Pretrain (Domain CPT)

Large corpus of Russian real-estate text; builds robust domain vocabulary, patterns, and document-level reasoning.

Stage 1 — Full 41k Prompt

Full system template + easy queries; teaches tone, etiquette, greetings, qualification patterns, and safety rules.

Stage 2 — Core 15k Rules

Mid-level compression; model begins internalizing main scripts, question ordering, CTAs, and objection handling.

Stage 3 — 3–5k Summary Prompt

High-compression stage; strengthens behavior even when template is short or partially omitted.

Stage 4 — Zero Prompt

Final distilled agent; fully internalized scripts, tone, policy, and flow — works with query-only inference.

Recommended Inference Settings

Temperature: 0.1 (max stability; operator tone)
Top-p: 1.0
Max tokens: 2000

System prompt (optional but recommended):

<system_instructions>
Вы Александр Оператор по недвижимости в Центр Подбора Новостроек 
Ваша миссия определить квалифицированных потенциальных клиентов 
для приобретения новостроек и обеспечить их связь со специализированными консультантами
</system_instructions>

Even though Stage-4 does not require system prompts, this small header ensures absolute consistency.

🚀 Quickstart: Run with vLLM (recommended)

1. Install environment

apt update && apt install -y python3-pip git  # Basics (30s)
pip install uv                                # Fast resolver (recommended)
uv venv main --python 3.12                    # Create isolated env
source main/bin/activate                      # Activate venv
uv pip install vllm                            # Install vLLM 0.11.0+ (2–5 min)

2. Serve the model with vLLM

vllm serve RAS1981/Qwen3-4B-outreach-stage4 \
  --max-model-len 8000 \
  --dtype auto \
  --enable-chunked-prefill \
  --max-num-batched-tokens 4000 \
  --port 8000 \
  --host 0.0.0.0 \
  --api-key token-abc123 \
  --trust-remote-code \
  --enforce-eager \
  --download-dir /tmp/hf_cache/models

vLLM will expose an OpenAI-compatible endpoint: http://<server>:8000/v1

🧪 TTFT Test (OpenAI client compatible)

import time
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1", 
    api_key="token-abc123"
)

def measure_ttft():
    start = time.perf_counter()
    first_seen = False

    response = client.chat.completions.create(
        model="RAS1981/Qwen3-4B-outreach-stage4",
        messages=[
            {
                'role': 'system',
                'content': (
                    '<system_instructions>'
                    'Вы Александр Оператор по недвижимости в Центр Подбора Новостроек '
                    'Ваша миссия определить квалифицированных потенциальных клиентов '
                    'для приобретения новостроек и обеспечить их связь '
                    'со специализированными консультантами'
                    '</system_instructions>'
                )
            },
            {"role": "user", "content": "здравствуйте"}
        ],
        max_tokens=2000,
        temperature=0.1,
        stream=True,
    )

    for idx, chunk in enumerate(response):
        delta = chunk.choices[0].delta
        text = getattr(delta, "content", None)
        t = (time.perf_counter() - start) * 1000

        if text and not first_seen:
            print(f">>> TTFT: {t:.0f} ms\n")
            first_seen = True

        if text:
            print(text, end="", flush=True)

if __name__ == "__main__":
    measure_ttft()

Downloads last month: 1

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for RAS1981/Qwen3-4B-outreach-stage4

Base model

RAS1981/Qwen3-4B-outreach-stage0

Finetuned

RAS1981/Qwen3-4B-outreach-stage1

Finetuned

RAS1981/Qwen3-4B-outreach-stage2

Finetuned

RAS1981/Qwen3-4B-outreach-stage3

Finetuned

(5)

this model