Correctness on the example question

#1
by benhaotang - opened

After run 10 times running local @ Q6_K

For task one:

  • 100% correct: 2 times
  • at least one if the diagram is correct: 5 times
  • completely wrong: 3 times

seems around 30-40% better than my local Phi4 @ Q6_K, note the result for Claude Sonnet 3.5 is {3,7,0}.

For task 2&3: just have fun.... most of the time even the momentum is wrongly defined, maybe this is too hard for local model:( for claude, most of the time the loop contribution is correct if the first task is correct, but none of the code is runable... it always tries to go too fansy.

Sign up or log in to comment