Qwen3-0.6B Chat CoreML

CoreML conversion of Qwen/Qwen3-0.6B for on-device inference on Apple Silicon (iPhone, iPad, Mac).

Model Details

Property	Value
Parameters	596M
Architecture	Qwen3 (GQA, RoPE, SwiGLU, RMSNorm)
Hidden size	1024
Layers	28
Attention heads	16 (8 KV heads)
Head dimension	128
Vocab size	151,936
Max sequence length	2048
Quantization	INT4 (per-block, linear symmetric)
Model size	~317 MB
CoreML target	iOS 18+ / macOS 15+

Usage

This model is designed for use with the speech-swift Qwen3Chat module:

import Qwen3Chat

let model = try await Qwen3ChatModel.fromPretrained()

// Single generation
let response = try model.generate(messages: [
    ChatMessage(role: .user, content: "Hello!")
])

// Streaming
let stream = model.chatStream("What is Swift?", systemPrompt: "Be brief.")
for try await chunk in stream {
    print(chunk, terminator: "")
}

Prompt caching

The chat() / chatStream() methods cache the system prompt KV state. Subsequent turns restore from cache instead of re-prefilling (~300ms saved per turn).

Files

File	Description
`Qwen3Chat.mlpackage/`	CoreML model (INT4 weights, float16 activations)
`chat_config.json`	Model architecture config
`vocab.json`	BPE vocabulary (151,936 tokens)
`merges.txt`	BPE merge rules
`tokenizer_config.json`	Tokenizer settings + added tokens
`tokenizer.json`	Full tokenizer (HuggingFace format)

Conversion

Converted using coremltools 9.0 from the original PyTorch weights:

python scripts/convert_qwen3_chat_coreml.py \
    --hf-model Qwen/Qwen3-0.6B \
    --output models/Qwen3-0.6B-Chat-CoreML \
    --quantize int4

KV Cache Design

The CoreML model uses explicit KV cache inputs/outputs per layer:

Inputs: layer_{i}_key_cache, layer_{i}_value_cache (float16)
Outputs: layer_{i}_key_cache_out, layer_{i}_value_cache_out (float16)
Shape: [1, 8, seq_len, 128] (batch, kv_heads, sequence, head_dim)

License

Apache 2.0 (same as base model)

Model tree for aufklarer/Qwen3-0.6B-Chat-CoreML

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-0.6B

Quantized

(307)

this model

Collection including aufklarer/Qwen3-0.6B-Chat-CoreML

CoreML Speech Models

Collection

Speech AI models for Apple Neural Engine via CoreML. iOS/macOS ready. ASR, TTS, VAD, diarization. • 23 items • Updated 3 days ago • 3

aufklarer
/

Qwen3-0.6B-Chat-CoreML