lora_structeval_t_qwen3_4b_0118
This repository provides a LoRA adapter fine-tuned from unsloth/Qwen3-4B-Instruct-2507 using Unsloth (QLoRA, 4-bit base).
- Contents: LoRA adapter weights (PEFT) + tokenizer files (if present)
- Does not include: Base model weights, training dataset files
Training Objective
This adapter was trained for structured output quality (format conversion / structured serialization) while avoiding learning verbose chain-of-thought.
Loss design
- The model sees the full conversation context (system + user + assistant).
- Loss is applied only to the final assistant turn ("assistant-only loss").
- Additionally, when an Output marker is present, loss is applied only to:
OUTPUT_LEARN_MODE="after_marker"- Markers searched:
Output:, OUTPUT:, Final:, Answer:, Result:, Response: - With
MASK_COT=Truethis typically means learning the content afterOutput:(suppressing CoT-style "Approach:" text).
This setup is intended to improve final answer correctness and formatting without encouraging the model to emit chain-of-thought.
Training Configuration (Key)
- Run stamp (UTC):
2026-01-18_062458Z - Base model:
unsloth/Qwen3-4B-Instruct-2507 - Dataset:
u-10bei/structured_data_with_cot_dataset_512_v2 - Method: QLoRA (4-bit base) + LoRA adapters (PEFT)
- Max sequence length:
512 - Seed:
3407 - Train/Val split: val_ratio=0.05
Hyperparameters
- Epochs:
2 - LR:
0.0001 - Warmup ratio:
0.1 - Weight decay:
0.05 - Per-device train batch:
2 - Gradient accumulation:
8(effective batch ≈16) - LR scheduler: cosine
- Precision: fp16 (T4-friendly)
LoRA
- r:
64 - alpha:
128 - dropout:
0.0 - target_modules:
q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Prompt / Output Style (Dataset-aligned)
The training dataset uses chat messages and often includes a short reasoning header followed by a final structured output. With the default masking setup, the adapter is optimized primarily for the final structured segment.
Typical assistant response shape:
Approach:(may be present, but often masked from loss)Output:(structured data begins here; primary training target)
You can encourage concise responses by explicitly requesting: "Return ONLY the final structured output."
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
base_id = "unsloth/Qwen3-4B-Instruct-2507"
adapter_id = "daichira/lora_structeval_t_qwen3_4b_0118"
tokenizer = AutoTokenizer.from_pretrained(base_id, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
base_id,
torch_dtype=torch.float16,
device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter_id)
model.eval()
# Example: run generation with your preferred chat template usage.
Limitations / Notes
This is a LoRA adapter, not a standalone model. You must load
unsloth/Qwen3-4B-Instruct-2507separately.Format correctness depends on your decoding settings and prompt discipline. For strict tasks, consider:
temperature=0(or low),top_p=1.0- Post-validation (JSON/YAML/TOML/XML parsers) where applicable
The adapter is specialized for structured serialization/format conversion; it may not improve general chat ability.
Sources & Terms (IMPORTANT)
- Training dataset:
u-10bei/structured_data_with_cot_dataset_512_v2(referenced on Hugging Face Hub) - This repository contains LoRA adapter weights only and does not redistribute the training dataset.
- You are responsible for complying with:
- The dataset license/terms as stated in the dataset repository.
- The base model license/terms for
unsloth/Qwen3-4B-Instruct-2507(these apply to derivatives/adapters as well).
License
- Adapter repo license field:
other(model card metadata) - Important: Base model terms for
unsloth/Qwen3-4B-Instruct-2507apply. Dataset terms foru-10bei/structured_data_with_cot_dataset_512_v2apply.
- Downloads last month
- 6
Model tree for daichira/test-lora-repo
Base model
Qwen/Qwen3-4B-Instruct-2507