Instructions to use RAS1981/Qwen3-4B-outreach-stage4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use RAS1981/Qwen3-4B-outreach-stage4 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="RAS1981/Qwen3-4B-outreach-stage4") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("RAS1981/Qwen3-4B-outreach-stage4") model = AutoModelForCausalLM.from_pretrained("RAS1981/Qwen3-4B-outreach-stage4") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use RAS1981/Qwen3-4B-outreach-stage4 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "RAS1981/Qwen3-4B-outreach-stage4" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "RAS1981/Qwen3-4B-outreach-stage4", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/RAS1981/Qwen3-4B-outreach-stage4
- SGLang
How to use RAS1981/Qwen3-4B-outreach-stage4 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "RAS1981/Qwen3-4B-outreach-stage4" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "RAS1981/Qwen3-4B-outreach-stage4", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "RAS1981/Qwen3-4B-outreach-stage4" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "RAS1981/Qwen3-4B-outreach-stage4", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio new
How to use RAS1981/Qwen3-4B-outreach-stage4 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for RAS1981/Qwen3-4B-outreach-stage4 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for RAS1981/Qwen3-4B-outreach-stage4 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for RAS1981/Qwen3-4B-outreach-stage4 to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="RAS1981/Qwen3-4B-outreach-stage4", max_seq_length=2048, ) - Docker Model Runner
How to use RAS1981/Qwen3-4B-outreach-stage4 with Docker Model Runner:
docker model run hf.co/RAS1981/Qwen3-4B-outreach-stage4
Qwen3-4B Outreach Agent — Prompt-Internalized (Stage 4)
RAS1981/Qwen3-4B-outreach-stage4
A compact Russian real-estate operator model trained through a 5-stage curriculum (CPT → S1 → S2 → S3 → S4) to fully internalize long, complex system prompts. This final Stage-4 version requires zero prompt scaffolding at inference and delivers fast TTFT, stable multi-turn reasoning, and consistent sales-oriented behavior.
Model Description
Stage 4 is the final distilled checkpoint in a progressive prompt-internalization pipeline. The model acts as a Russian real-estate qualification agent, trained to:
- Greet users, set conversation tone, and collect key parameters (район, бюджет, сроки).
- Handle noisy/fragmented queries, objections, misclicks, corrections.
- Maintain long, multi-turn conversation state internally (no system prompt needed).
- Direct the client toward booking a call, meeting, or viewing.
- Stay within business rules and safely decline out-of-scope topics.
- Produce structured, concise operator-style messages (bullet points, quick summaries).
The model has been aligned across 4 SFT stages and 1 domain-pretrain stage to operate fully autonomously.
Training Stages Overview
(No hyperparameters disclosed — only conceptual behavior.)
Stage 0 — Continued Pretrain (Domain CPT)
Large corpus of Russian real-estate text; builds robust domain vocabulary, patterns, and document-level reasoning.
Stage 1 — Full 41k Prompt
Full system template + easy queries; teaches tone, etiquette, greetings, qualification patterns, and safety rules.
Stage 2 — Core 15k Rules
Mid-level compression; model begins internalizing main scripts, question ordering, CTAs, and objection handling.
Stage 3 — 3–5k Summary Prompt
High-compression stage; strengthens behavior even when template is short or partially omitted.
Stage 4 — Zero Prompt
Final distilled agent; fully internalized scripts, tone, policy, and flow — works with query-only inference.
Recommended Inference Settings
Temperature:
0.1(max stability; operator tone)Top-p:
1.0Max tokens: 2000
System prompt (optional but recommended):
<system_instructions> Вы Александр Оператор по недвижимости в Центр Подбора Новостроек Ваша миссия определить квалифицированных потенциальных клиентов для приобретения новостроек и обеспечить их связь со специализированными консультантами </system_instructions>
Even though Stage-4 does not require system prompts, this small header ensures absolute consistency.
🚀 Quickstart: Run with vLLM (recommended)
1. Install environment
apt update && apt install -y python3-pip git # Basics (30s)
pip install uv # Fast resolver (recommended)
uv venv main --python 3.12 # Create isolated env
source main/bin/activate # Activate venv
uv pip install vllm # Install vLLM 0.11.0+ (2–5 min)
2. Serve the model with vLLM
vllm serve RAS1981/Qwen3-4B-outreach-stage4 \
--max-model-len 8000 \
--dtype auto \
--enable-chunked-prefill \
--max-num-batched-tokens 4000 \
--port 8000 \
--host 0.0.0.0 \
--api-key token-abc123 \
--trust-remote-code \
--enforce-eager \
--download-dir /tmp/hf_cache/models
vLLM will expose an OpenAI-compatible endpoint:
http://<server>:8000/v1
🧪 TTFT Test (OpenAI client compatible)
import time
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="token-abc123"
)
def measure_ttft():
start = time.perf_counter()
first_seen = False
response = client.chat.completions.create(
model="RAS1981/Qwen3-4B-outreach-stage4",
messages=[
{
'role': 'system',
'content': (
'<system_instructions>'
'Вы Александр Оператор по недвижимости в Центр Подбора Новостроек '
'Ваша миссия определить квалифицированных потенциальных клиентов '
'для приобретения новостроек и обеспечить их связь '
'со специализированными консультантами'
'</system_instructions>'
)
},
{"role": "user", "content": "здравствуйте"}
],
max_tokens=2000,
temperature=0.1,
stream=True,
)
for idx, chunk in enumerate(response):
delta = chunk.choices[0].delta
text = getattr(delta, "content", None)
t = (time.perf_counter() - start) * 1000
if text and not first_seen:
print(f">>> TTFT: {t:.0f} ms\n")
first_seen = True
if text:
print(text, end="", flush=True)
if __name__ == "__main__":
measure_ttft()
- Downloads last month
- 1
Model tree for RAS1981/Qwen3-4B-outreach-stage4
Base model
RAS1981/Qwen3-4B-outreach-stage0