lora_structeval_t_qwen3_4b_0118

This repository provides a LoRA adapter fine-tuned from unsloth/Qwen3-4B-Instruct-2507 using Unsloth (QLoRA, 4-bit base).

  • Contents: LoRA adapter weights (PEFT) + tokenizer files (if present)
  • Does not include: Base model weights, training dataset files

Training Objective

This adapter was trained for structured output quality (format conversion / structured serialization) while avoiding learning verbose chain-of-thought.

Loss design

  • The model sees the full conversation context (system + user + assistant).
  • Loss is applied only to the final assistant turn ("assistant-only loss").
  • Additionally, when an Output marker is present, loss is applied only to:
    • OUTPUT_LEARN_MODE="after_marker"
    • Markers searched: Output:, OUTPUT:, Final:, Answer:, Result:, Response:
    • With MASK_COT=True this typically means learning the content after Output: (suppressing CoT-style "Approach:" text).

This setup is intended to improve final answer correctness and formatting without encouraging the model to emit chain-of-thought.


Training Configuration (Key)

  • Run stamp (UTC): 2026-01-18_062458Z
  • Base model: unsloth/Qwen3-4B-Instruct-2507
  • Dataset: u-10bei/structured_data_with_cot_dataset_512_v2
  • Method: QLoRA (4-bit base) + LoRA adapters (PEFT)
  • Max sequence length: 512
  • Seed: 3407
  • Train/Val split: val_ratio=0.05

Hyperparameters

  • Epochs: 2
  • LR: 0.0001
  • Warmup ratio: 0.1
  • Weight decay: 0.05
  • Per-device train batch: 2
  • Gradient accumulation: 8 (effective batch ≈ 16)
  • LR scheduler: cosine
  • Precision: fp16 (T4-friendly)

LoRA

  • r: 64
  • alpha: 128
  • dropout: 0.0
  • target_modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Prompt / Output Style (Dataset-aligned)

The training dataset uses chat messages and often includes a short reasoning header followed by a final structured output. With the default masking setup, the adapter is optimized primarily for the final structured segment.

Typical assistant response shape:

  • Approach: (may be present, but often masked from loss)
  • Output: (structured data begins here; primary training target)

You can encourage concise responses by explicitly requesting: "Return ONLY the final structured output."


Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base_id = "unsloth/Qwen3-4B-Instruct-2507"
adapter_id = "daichira/lora_structeval_t_qwen3_4b_0118"

tokenizer = AutoTokenizer.from_pretrained(base_id, use_fast=True)

model = AutoModelForCausalLM.from_pretrained(
    base_id,
    torch_dtype=torch.float16,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter_id)
model.eval()

# Example: run generation with your preferred chat template usage.

Limitations / Notes

  • This is a LoRA adapter, not a standalone model. You must load unsloth/Qwen3-4B-Instruct-2507 separately.

  • Format correctness depends on your decoding settings and prompt discipline. For strict tasks, consider:

    • temperature=0 (or low), top_p=1.0
    • Post-validation (JSON/YAML/TOML/XML parsers) where applicable
  • The adapter is specialized for structured serialization/format conversion; it may not improve general chat ability.


Sources & Terms (IMPORTANT)

  • Training dataset: u-10bei/structured_data_with_cot_dataset_512_v2 (referenced on Hugging Face Hub)
  • This repository contains LoRA adapter weights only and does not redistribute the training dataset.
  • You are responsible for complying with:
    • The dataset license/terms as stated in the dataset repository.
    • The base model license/terms for unsloth/Qwen3-4B-Instruct-2507 (these apply to derivatives/adapters as well).

License

  • Adapter repo license field: other (model card metadata)
  • Important: Base model terms for unsloth/Qwen3-4B-Instruct-2507 apply. Dataset terms for u-10bei/structured_data_with_cot_dataset_512_v2 apply.
Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for daichira/test-lora-repo

Adapter
(396)
this model

Dataset used to train daichira/test-lora-repo