wang's picture

4

wang

wzx111

AI & ML interests

None yet

Recent Activity

new activity about 2 months ago

wzx111/Qwen3-1.7B-MATH-GDPO:Which post-training method was actually used for this model, GDPO or GRPO?

updated a dataset 3 months ago

wzx111/MATH-lighteval-level3

published a dataset 3 months ago

wzx111/MATH-lighteval-level3

View all activity

Organizations

None yet

New activity in wzx111/Qwen3-1.7B-MATH-GDPO about 2 months ago

Which post-training method was actually used for this model, GDPO or GRPO?

#1 opened about 2 months ago by

New activity in Qwen/Qwen3-235B-A22B 9 months ago

是不是奖励函数没有ngram重复度惩罚

#7 opened 10 months ago by

New activity in Qwen/Qwen3-1.7B 10 months ago

【Evaluation】Best practice for evaluating Qwen3 !!

#2 opened 10 months ago by

New activity in wzx111/Qwen2.5-1.5B-Open-R1-GRPO 10 months ago

Improve language tag

#1 opened 10 months ago by