ssmits/Qwen2.5-95B-Instruct not running

#962
by ssmits - opened

Hi HF team,

Specifically made ssmits/Qwen2.5-95B-Instruct to check if a near 100B model could beat the scores of Qwen2.5-72B-Instruct.
Unfortunately, this won't run, or at least I don't see the results show up. The architecture is exactly the same, it only has ~25 extra layers.

Cheers,
Stijn

Open LLM Leaderboard org

Hi @ssmits !
Can you follow the steps in the FAQ and give us the link to the request file?

Thank you for your swift response. I checked every step and think I've managed to correctly follow it. Just found the request dataset.
ssmits/Qwen2.5-95B-Instruct
9c0e7df57a4fcf4d364efd916a0fc0abdd2d20a3
bfloat16
94.648
Qwen2ForCausalLM
Original
RUNNING
"2024-09-26T19:13:02"
💬 : 💬 chat models (RLHF, DPO, IFT, ...)
8938706
2024-09-26T19:13:20.797580
true
ssmits

Apparently still running for (6 days), is this normal for a model this size?

ssmits changed discussion status to closed
ssmits changed discussion status to open
Open LLM Leaderboard org

Hi! Next time please provide the link to the file, it will be easier to debug for us :)

The model was probably not running from the precise moment of submission, as we use the spare cycles of the cluster - it's also possible that the job was cancelled and rescheduled if something more important was launched (like model training or equivalent).
To give you an idea, such a big model would likely take around than 2 days (if it can fit on one node with 8GPUs), so the current situation is normal. Feel free to reopen in a week if it still has not finished!

clefourrier changed discussion status to closed

Thank you very much and thanks for the feedback. I will wait patiently :)

Sign up or log in to comment