LFM2-8B-A1B SmolTalk LoRA

LoRA adapter for LiquidAI/LFM2-8B-A1B (8B total / 1B active MoE) fine-tuned on SmolTalk. Standard PEFT-native artifact β€” works with PeftModel.from_pretrained, vLLM, and SGLang.

Trained with PEFT 0.18+'s target_parameters so the LoRA reaches the batched 3D MoE expert weights (experts.gate_up_proj, experts.down_proj) in addition to the dense modules.

Training

Parameter Value
Base model LiquidAI/LFM2-8B-A1B
Dataset HuggingFaceTB/smoltalk (5000-sample slice)
Rank (r) 8
Alpha 16
Dropout 0 (required by PEFT ParamWrapper)
Learning rate 5e-5 (linear schedule, warmup 0.1)
Epochs 3
Effective batch size 64
Eval loss 0.974 β†’ 0.852 β†’ 0.827 (epoch 1 β†’ 2 β†’ 3)

LoRA targets

  • Attention: q_proj, k_proj, v_proj, out_proj
  • Dense MLP (layers 0–1): w1, w2, w3
  • ShortConv: in_proj, out_proj
  • MoE router: gate
  • MoE experts (batched 3D): experts.gate_up_proj, experts.down_proj

Usage

Python (PEFT)

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

MODEL_ID = "LiquidAI/LFM2-8B-A1B"
LORA_ID = "LiquidAI/LFM2-8B-A1B-smoltalk-LoRA"

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, trust_remote_code=True)
base_model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID, dtype=torch.bfloat16, device_map="auto", trust_remote_code=True,
)

prompt = "What are some ideas for a good short story about a city not on a planet, but rather a generation ship, or on the moon of a gas giant, or somewhere else unusual?"
messages = [{"role": "user", "content": prompt}]
input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(input_text, return_tensors="pt").to(base_model.device)

with torch.no_grad():
    outputs = base_model.generate(**inputs, max_new_tokens=100, do_sample=False)
print("Base:", tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

lora_model = PeftModel.from_pretrained(base_model, LORA_ID)
with torch.no_grad():
    outputs = lora_model.generate(**inputs, max_new_tokens=100, do_sample=False)
print("LoRA:", tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

SGLang

Requires a build of SGLang with native PEFT 0.18+ batched-MoE LoRA loading.

Launch:

python -m sglang.launch_server \
    --model-path LiquidAI/LFM2-8B-A1B \
    --port 30000 \
    --enable-lora --max-lora-rank 8 \
    --lora-paths "smoltalk=LiquidAI/LFM2-8B-A1B-smoltalk-LoRA" \
    --lora-target-modules q_proj k_proj v_proj out_proj w1 w2 w3 in_proj gate gate_up_proj down_proj

Call with LoRA (Option 1 β€” lora_path field):

curl -sS http://localhost:30000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
      "model": "LiquidAI/LFM2-8B-A1B",
      "lora_path": "smoltalk",
      "messages": [{"role": "user", "content": "What are some ideas for a good short story about a city not on a planet, but rather a generation ship, or on the moon of a gas giant, or somewhere else unusual?"}],
      "max_tokens": 64,
      "temperature": 0
    }'

Call with LoRA (Option 2 β€” colon syntax):

curl -sS http://localhost:30000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
      "model": "LiquidAI/LFM2-8B-A1B:smoltalk",
      "messages": [{"role": "user", "content": "What are some ideas for a good short story about a city not on a planet, but rather a generation ship, or on the moon of a gas giant, or somewhere else unusual?"}],
      "max_tokens": 64,
      "temperature": 0
    }'

Call without LoRA (base model):

curl -sS http://localhost:30000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
      "model": "LiquidAI/LFM2-8B-A1B",
      "messages": [{"role": "user", "content": "What are some ideas for a good short story about a city not on a planet, but rather a generation ship, or on the moon of a gas giant, or somewhere else unusual?"}],
      "max_tokens": 64,
      "temperature": 0
    }'

vLLM

vllm serve LiquidAI/LFM2-8B-A1B \
    --host 0.0.0.0 --port 30000 \
    --dtype bfloat16 \
    --enable-lora --max-lora-rank 8 \
    --lora-modules "smoltalk=LiquidAI/LFM2-8B-A1B-smoltalk-LoRA" \
    --gpu-memory-utilization 0.5
curl -sS http://localhost:30000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
      "model": "smoltalk",
      "messages": [{"role": "user", "content": "..."}],
      "max_tokens": 64,
      "temperature": 0
    }'
Downloads last month
98
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for LiquidAI/LFM2-8B-A1B-smoltalk-LoRA

Adapter
(4)
this model

Dataset used to train LiquidAI/LFM2-8B-A1B-smoltalk-LoRA