Qwen3-50M with GPT-2 Tokenizer

A 50M parameter version of Qwen3-0.6B using GPT-2's tokenizer for better compatibility.

Model Details

  • Base Model: Qwen/Qwen3-0.6B
  • Architecture: Qwen3 (8 layers, 384 hidden size)
  • Parameters: ~50M (reduced from 637M)
  • Tokenizer: GPT-2 (50,257 vocabulary)
  • Vocabulary: Reduced from 151,936 to 50,257 tokens

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

# Standard HuggingFace usage - no special flags needed
tokenizer = AutoTokenizer.from_pretrained("Mostafa8Mehrabi/qwen3-50m")
model = AutoModelForCausalLM.from_pretrained("Mostafa8Mehrabi/qwen3-50m")

inputs = tokenizer("Hello, how are you?", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50, do_sample=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Key Features

  • โœ… Standard GPT-2 tokenizer (no trust_remote_code)
  • โœ… Compatible vocabulary sizes
  • โœ… Works like any HuggingFace model
  • โœ… 13x smaller than original Qwen3-0.6B

Architecture Comparison

Component Original This Model
Parameters 637M ~50M
Vocabulary 151,936 50,257
Hidden Size 1024 384
Layers 28 8
Tokenizer Qwen3 GPT-2
Downloads last month
3
Safetensors
Model size
71.6M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Mostafa8Mehrabi/qwen3-50m-fp32

Finetuned
Qwen/Qwen3-0.6B
Finetuned
(348)
this model
Finetunes
1 model