DMax-Models
Collection
2 items β’ Updated
DMax: Aggressive Parallel Decoding for dLLMs
Zigeng Chen, Gongfan Fang, Xinyin Ma, Ruonan Yu, Xinchao Wang
xML Lab, National University of Singapore
| Model | Description | Source Model | Link |
|---|---|---|---|
| π€ DMax-Math-16B | Highly parallel dLLM for math and reasoning. | LLaDA-2.0-mini | HF |
| π€ DMax-Coder-16B | Highly parallel dLLM for code generation. | LLaDA-2.0-mini | HF |
| Dataset | Description | Link |
|---|---|---|
| π DMax-Math-Training-Data | math trajectories generated by LLaDA-2.0-mini | HF |
| π DMax-Code-Training-Data | code trajectories generated by LLaDA-2.0-mini | HF |
import torch
from transformers import AutoModelForCausalLM
from transformers import AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"Zigeng/DMax-Math-16B", trust_remote_code=True, device_map="cuda:0"
)
model = model.to(torch.bfloat16)
model.eval()
tokenizer = AutoTokenizer.from_pretrained("Zigeng/DMax-Math-16B", trust_remote_code=True)
prompt = "A robe takes 2 bolts of blue fiber and half that much white fiber. How many bolts in total does it take?" + "\nLet's think step by step\n"
input_ids = tokenizer.apply_chat_template(
[{"role": "user", "content": prompt}],
add_generation_prompt=True,
tokenize=True,
return_tensors="pt",
)
nfe, generated_tokens = model.generate_spd(
inputs=input_ids,
gen_length=2048,
block_length=32,
threshold=0.5,
)
generated_answer = tokenizer.decode(
generated_tokens[0],
skip_special_tokens=True,
)
print(generated_answer)
print("nfe:",nfe,"token length",len(generated_tokens[0]))
Base model
inclusionAI/LLaDA2.0-mini