HuggingFaceTB/smoltalk
Viewer β’ Updated β’ 2.2M β’ 17.8k β’ 410
How to use LiquidAI/LFM2-8B-A1B-smoltalk-LoRA with PEFT:
from peft import PeftModel
from transformers import AutoModelForCausalLM
base_model = AutoModelForCausalLM.from_pretrained("LiquidAI/LFM2-8B-A1B")
model = PeftModel.from_pretrained(base_model, "LiquidAI/LFM2-8B-A1B-smoltalk-LoRA")LoRA adapter for LiquidAI/LFM2-8B-A1B (8B total / 1B active MoE) fine-tuned on SmolTalk. Standard PEFT-native artifact β works with PeftModel.from_pretrained, vLLM, and SGLang.
Trained with PEFT 0.18+'s target_parameters so the LoRA reaches the batched 3D MoE expert weights (experts.gate_up_proj, experts.down_proj) in addition to the dense modules.
| Parameter | Value |
|---|---|
| Base model | LiquidAI/LFM2-8B-A1B |
| Dataset | HuggingFaceTB/smoltalk (5000-sample slice) |
| Rank (r) | 8 |
| Alpha | 16 |
| Dropout | 0 (required by PEFT ParamWrapper) |
| Learning rate | 5e-5 (linear schedule, warmup 0.1) |
| Epochs | 3 |
| Effective batch size | 64 |
| Eval loss | 0.974 β 0.852 β 0.827 (epoch 1 β 2 β 3) |
q_proj, k_proj, v_proj, out_projw1, w2, w3in_proj, out_projgateexperts.gate_up_proj, experts.down_projimport torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
MODEL_ID = "LiquidAI/LFM2-8B-A1B"
LORA_ID = "LiquidAI/LFM2-8B-A1B-smoltalk-LoRA"
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, trust_remote_code=True)
base_model = AutoModelForCausalLM.from_pretrained(
MODEL_ID, dtype=torch.bfloat16, device_map="auto", trust_remote_code=True,
)
prompt = "What are some ideas for a good short story about a city not on a planet, but rather a generation ship, or on the moon of a gas giant, or somewhere else unusual?"
messages = [{"role": "user", "content": prompt}]
input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(input_text, return_tensors="pt").to(base_model.device)
with torch.no_grad():
outputs = base_model.generate(**inputs, max_new_tokens=100, do_sample=False)
print("Base:", tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
lora_model = PeftModel.from_pretrained(base_model, LORA_ID)
with torch.no_grad():
outputs = lora_model.generate(**inputs, max_new_tokens=100, do_sample=False)
print("LoRA:", tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Requires a build of SGLang with native PEFT 0.18+ batched-MoE LoRA loading.
Launch:
python -m sglang.launch_server \
--model-path LiquidAI/LFM2-8B-A1B \
--port 30000 \
--enable-lora --max-lora-rank 8 \
--lora-paths "smoltalk=LiquidAI/LFM2-8B-A1B-smoltalk-LoRA" \
--lora-target-modules q_proj k_proj v_proj out_proj w1 w2 w3 in_proj gate gate_up_proj down_proj
Call with LoRA (Option 1 β lora_path field):
curl -sS http://localhost:30000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "LiquidAI/LFM2-8B-A1B",
"lora_path": "smoltalk",
"messages": [{"role": "user", "content": "What are some ideas for a good short story about a city not on a planet, but rather a generation ship, or on the moon of a gas giant, or somewhere else unusual?"}],
"max_tokens": 64,
"temperature": 0
}'
Call with LoRA (Option 2 β colon syntax):
curl -sS http://localhost:30000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "LiquidAI/LFM2-8B-A1B:smoltalk",
"messages": [{"role": "user", "content": "What are some ideas for a good short story about a city not on a planet, but rather a generation ship, or on the moon of a gas giant, or somewhere else unusual?"}],
"max_tokens": 64,
"temperature": 0
}'
Call without LoRA (base model):
curl -sS http://localhost:30000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "LiquidAI/LFM2-8B-A1B",
"messages": [{"role": "user", "content": "What are some ideas for a good short story about a city not on a planet, but rather a generation ship, or on the moon of a gas giant, or somewhere else unusual?"}],
"max_tokens": 64,
"temperature": 0
}'
vllm serve LiquidAI/LFM2-8B-A1B \
--host 0.0.0.0 --port 30000 \
--dtype bfloat16 \
--enable-lora --max-lora-rank 8 \
--lora-modules "smoltalk=LiquidAI/LFM2-8B-A1B-smoltalk-LoRA" \
--gpu-memory-utilization 0.5
curl -sS http://localhost:30000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "smoltalk",
"messages": [{"role": "user", "content": "..."}],
"max_tokens": 64,
"temperature": 0
}'
Base model
LiquidAI/LFM2-8B-A1B