Can't reproduce MATH performance

#66
by jpiabrantes - opened

In the model card you say you achieved 51.9 on MATH using Llama 3.1 8B with zero-shots.

I can't reproduce this. Did you use lm-evaluation-harness or some other code?

You may need to remove the system role if you use the default apply_chat_template function.

Sign up or log in to comment