Instructions to use astom-M/matsuo-llm-advanced-phase-a with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use astom-M/matsuo-llm-advanced-phase-a with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="astom-M/matsuo-llm-advanced-phase-a")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("astom-M/matsuo-llm-advanced-phase-a")
model = AutoModelForCausalLM.from_pretrained("astom-M/matsuo-llm-advanced-phase-a")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use astom-M/matsuo-llm-advanced-phase-a with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "astom-M/matsuo-llm-advanced-phase-a"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "astom-M/matsuo-llm-advanced-phase-a",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/astom-M/matsuo-llm-advanced-phase-a

SGLang

How to use astom-M/matsuo-llm-advanced-phase-a with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "astom-M/matsuo-llm-advanced-phase-a" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "astom-M/matsuo-llm-advanced-phase-a",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "astom-M/matsuo-llm-advanced-phase-a" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "astom-M/matsuo-llm-advanced-phase-a",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use astom-M/matsuo-llm-advanced-phase-a with Docker Model Runner:
```
docker model run hf.co/astom-M/matsuo-llm-advanced-phase-a
```

Qwen2.5-7B-Instruct + Phase A Multi-Benchmark LoRA

Model Description

This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct optimized for multi-benchmark agent tasks (ALFWorld + DBBench).

Key characteristics:

Base model: Qwen2.5-7B-Instruct
Training method: bf16 LoRA (NOT QLoRA 4-bit) — zero rounding errors during merge
Format: bfloat16 safetensors (no quantization)
Size: ~15GB
Compatible with: vLLM v0.13.0+, transformers, etc.

Training Details

LoRA Configuration

Parameter	Value
LoRA rank (r)	16
LoRA alpha	32
LoRA dropout	0
Target modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Trainable params	~0.28% of total

Training Hyperparameters

Parameter	Value
Learning rate	3e-5
Epochs	1.0
Batch size (effective)	16 (4 × 4 grad accum)
Max sequence length	4096
LR scheduler	linear
Optimizer	AdamW 8-bit
Warmup ratio	0.03
Weight decay	0.01
Precision	bfloat16

Training Data

Total samples: 3,500
Composition:
- Official DBBench v4: 1,200 samples (34.3%)
- Official ALFWorld v5: 1,050 samples (30.0%)
- Existing Spider/BIRD: 1,250 samples (35.7%)
Sources:
- DBBench: u-10bei/dbbench_sft_dataset_react_v4
- ALFWorld: u-10bei/sft_alfworld_trajectory_dataset_v5
- Existing: Spider (Yale) + BIRD (HKU)
Generation method: Official datasets + template-based synthetic data

Training Results

Training steps: 225
Training time: 12.4 minutes (RTX 5090)
Best checkpoint: step 150
Train loss: 0.6436 → 0.2643 (59% improvement)
Eval loss: 0.6588 → 0.2769 (58% improvement)
Best eval loss: 0.2769
Peak VRAM: ~26GB / 32GB

Performance Metrics

Benchmark	Score
DBBench (expected)	55%+
ALFWorld (expected)	65%+

Usage

Basic Usage with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "astom-M/matsuo-llm-advanced-phase-a",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
    "astom-M/matsuo-llm-advanced-phase-a",
    trust_remote_code=True
)

vLLM Deployment

python -m vllm.entrypoints.openai.api_server \
    --model astom-M/matsuo-llm-advanced-phase-a \
    --dtype bfloat16 \
    --max-model-len 4096

Important Notes

No quantization artifacts: Trained in bf16 full precision (not QLoRA 4-bit), eliminating rounding errors from quantization-to-bf16 merge
config.json does NOT contain quantization_config — clean bf16 model
All safetensor weights are in torch.bfloat16 dtype
Multi-benchmark optimization: Balanced training across ALFWorld and DBBench tasks

Compliance

Base model: Qwen2.5-7B-Instruct (Apache 2.0 license, whitelisted for competition)
Training data: Official competition datasets + template-based synthetic data
No inference code modification
No RAG/ToolUse
No commercial API usage

Training Strategy

This model was trained as Phase A of a multi-phase optimization strategy:

Goal: Improve base model performance on both ALFWorld (household tasks) and DBBench (SQL generation)
Approach: Conservative LoRA fine-tuning with balanced dataset composition
Constraint: Must maintain compatibility with production evaluation environment (yaml変更不可)

The training data composition was carefully balanced to:

Leverage official competition datasets (64.3%)
Preserve base model capabilities through existing data (35.7%)
Avoid catastrophic forgetting through moderate learning rate and careful hyperparameter tuning

License

Apache 2.0 (inherited from Qwen2.5-7B-Instruct)

Model Card Metadata:

Model size: 8B parameters
Tensor type: BF16
Format: Safetensors
Training date: 2026-02-16

Downloads last month: 2

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for astom-M/matsuo-llm-advanced-phase-a

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Adapter

(2130)

this model

Datasets used to train astom-M/matsuo-llm-advanced-phase-a

Evaluation results

Evaluation Loss
self-reported

0.277