To the few who can run it, Post speeds, setup, results quality.
To the few who can run it, Post speeds, setup, results quality.
I use the 4-bit version on 128 gb MacBook m3 Max. all the samples in the model card come from that.
How well does the q_4 version compare to Miqu 120b?
How many tokens per second could you reach with your Macbook in this way? @ehartford
This is a very different model. Not really comparable.
How well does the q_4 version compare to Miqu 120b?
I use the 4-bit version on 128 gb MacBook m3 Max. all the samples in the model card come from that.
Hi Eric, how is your t/s? dare I say sup 1 t/s
I use the 4-bit version on 128 gb MacBook m3 Max. all the samples in the model card come from that.
Hi Eric, how is your t/s? dare I say sup 1 t/s
I quantized the OG model myself so its q4_k_m but here is what I am getting on a Mac studio/M1/64core gpu/128gb ram
total duration: 2m30.232853459s
load duration: 1.101417ms
prompt eval count: 23 token(s)
prompt eval duration: 3.85487s
prompt eval rate: 5.97 tokens/s
eval count: 720 token(s)
eval duration: 2m26.375722s
eval rate: 4.92 tokens/s
Hi, How can I merge the aa and ab files?