YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Qwen3-235B-A22B-EAGLE3 (Speculators Format)

This is a conversion of lmsys/Qwen3-235B-A22B-EAGLE3 to the vLLM speculators format for use with Eagle3 speculative decoding.

Model Details

  • Base Model: Qwen/Qwen3-235B-A22B-Instruct-2507-FP8
  • Draft Model Architecture: Llama-based Eagle3 head
  • Original Model: lmsys/Qwen3-235B-A22B-EAGLE3
  • Format: vLLM Speculators v0.1.0.dev42

Model Configuration

  • Draft Vocabulary Size: 32,000
  • Target Vocabulary Size: 151,936
  • Hidden Size: 4,096
  • Intermediate Size: 24,576
  • Number of Layers: 1 (Eagle3 head layer)
  • Attention Heads: 64
  • KV Heads: 4
  • Auxiliary Hidden State Layers: [1, 46, 90]

Usage

This model is designed to be used with vLLM's Eagle3 speculative decoding implementation:

from vllm import LLM

llm = LLM(
    model="Qwen/Qwen3-235B-A22B-Instruct-2507-FP8",
    speculative_config={
        "method": "eagle3",
        "model": "nm-testing/Qwen3-235B-A22B-EAGLE3-converted-speculators-lmsys",
        "num_speculative_tokens": 3,
    },
    tensor_parallel_size=2,
)

Or via command line:

python examples/offline_inference/spec_decode.py \
  --method "eagle3" \
  --tp 2 \
  --model-dir "Qwen/Qwen3-235B-A22B-Instruct-2507-FP8" \
  --eagle-dir "nm-testing/Qwen3-235B-A22B-EAGLE3-converted-speculators-lmsys" \
  --num-spec-tokens 3

Conversion Details

The original Eagle3 config format has been converted to the vLLM speculators format with the following changes:

  1. Architecture: Changed from LlamaForCausalLMEagle3 to Eagle3Speculator
  2. Config Structure: Reorganized into transformer_layer_config and speculators_config sections
  3. Auxiliary Layers: Extracted from eagle_config.eagle_aux_hidden_state_layer_ids to top-level eagle_aux_hidden_state_layer_ids
  4. Verifier Config: Added explicit verifier model specification

Files

  • config.json: Model configuration in speculators format
  • model.safetensors: Model weights (unchanged from original)

Citation

If you use this model, please cite the original Eagle3 paper and the LMSYS team:

@article{li2024eagle,
  title={EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees},
  author={Li, Yuhui and Wei, Fangyun and Zhang, Chao and Zhang, Hongyang},
  journal={arXiv preprint arXiv:2406.16858},
  year={2024}
}

License

Same as the original model: lmsys/Qwen3-235B-A22B-EAGLE3

Downloads last month
18
Safetensors
Model size
1B params
Tensor type
I64
BF16
BOOL
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support