how to reproduce the benchmark score?
#4
by
						
lincharliesun
	
							
						- opened
							
					
Just as mentioned in the README,
"We run the benchmarks up to 16 times and average the scores to be more accurate."
I'm curious. For taking the average of 16 times, is it sampling 16 times with the same seed or sampling once for each of 16 seeds?
 Will there be a big difference between these two methods?
thanks
