YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Qwen3-235B-A22B-EAGLE3 (Speculators Format)
This is a conversion of lmsys/Qwen3-235B-A22B-EAGLE3 to the vLLM speculators format for use with Eagle3 speculative decoding.
Model Details
- Base Model: Qwen/Qwen3-235B-A22B-Instruct-2507-FP8
- Draft Model Architecture: Llama-based Eagle3 head
- Original Model: lmsys/Qwen3-235B-A22B-EAGLE3
- Format: vLLM Speculators v0.1.0.dev42
Model Configuration
- Draft Vocabulary Size: 32,000
- Target Vocabulary Size: 151,936
- Hidden Size: 4,096
- Intermediate Size: 24,576
- Number of Layers: 1 (Eagle3 head layer)
- Attention Heads: 64
- KV Heads: 4
- Auxiliary Hidden State Layers: [1, 46, 90]
Usage
This model is designed to be used with vLLM's Eagle3 speculative decoding implementation:
from vllm import LLM
llm = LLM(
model="Qwen/Qwen3-235B-A22B-Instruct-2507-FP8",
speculative_config={
"method": "eagle3",
"model": "nm-testing/Qwen3-235B-A22B-EAGLE3-converted-speculators-lmsys",
"num_speculative_tokens": 3,
},
tensor_parallel_size=2,
)
Or via command line:
python examples/offline_inference/spec_decode.py \
--method "eagle3" \
--tp 2 \
--model-dir "Qwen/Qwen3-235B-A22B-Instruct-2507-FP8" \
--eagle-dir "nm-testing/Qwen3-235B-A22B-EAGLE3-converted-speculators-lmsys" \
--num-spec-tokens 3
Conversion Details
The original Eagle3 config format has been converted to the vLLM speculators format with the following changes:
- Architecture: Changed from
LlamaForCausalLMEagle3toEagle3Speculator - Config Structure: Reorganized into
transformer_layer_configandspeculators_configsections - Auxiliary Layers: Extracted from
eagle_config.eagle_aux_hidden_state_layer_idsto top-leveleagle_aux_hidden_state_layer_ids - Verifier Config: Added explicit verifier model specification
Files
config.json: Model configuration in speculators formatmodel.safetensors: Model weights (unchanged from original)
Citation
If you use this model, please cite the original Eagle3 paper and the LMSYS team:
@article{li2024eagle,
title={EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees},
author={Li, Yuhui and Wei, Fangyun and Zhang, Chao and Zhang, Hongyang},
journal={arXiv preprint arXiv:2406.16858},
year={2024}
}
License
Same as the original model: lmsys/Qwen3-235B-A22B-EAGLE3
- Downloads last month
- 18
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
馃檵
Ask for provider support