Qwen3-50M with GPT-2 Tokenizer
A 50M parameter version of Qwen3-0.6B using GPT-2's tokenizer for better compatibility.
Model Details
- Base Model: Qwen/Qwen3-0.6B
- Architecture: Qwen3 (8 layers, 384 hidden size)
- Parameters: ~50M (reduced from 637M)
- Tokenizer: GPT-2 (50,257 vocabulary)
- Vocabulary: Reduced from 151,936 to 50,257 tokens
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
# Standard HuggingFace usage - no special flags needed
tokenizer = AutoTokenizer.from_pretrained("Mostafa8Mehrabi/qwen3-50m")
model = AutoModelForCausalLM.from_pretrained("Mostafa8Mehrabi/qwen3-50m")
inputs = tokenizer("Hello, how are you?", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50, do_sample=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Key Features
- โ Standard GPT-2 tokenizer (no trust_remote_code)
- โ Compatible vocabulary sizes
- โ Works like any HuggingFace model
- โ 13x smaller than original Qwen3-0.6B
Architecture Comparison
| Component | Original | This Model |
|---|---|---|
| Parameters | 637M | ~50M |
| Vocabulary | 151,936 | 50,257 |
| Hidden Size | 1024 | 384 |
| Layers | 28 | 8 |
| Tokenizer | Qwen3 | GPT-2 |
- Downloads last month
- 3
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support