Comparison among Dolly v2 3b, 7b and 12b
#6
by
abhi24
- opened
Hello!
I have only tried the Dolly v2 12b so far. I'm curious if anyone has tried all three.
- Is there a considerable difference in the response time?
- If I were to finetune the model, do I need lesser training samples if I use smaller models?
Thanks,
Abhilash
abhi24
changed discussion title from
Response time comparison among 3b, 7b and 12b
to Response time comparison among Dolly v2 3b, 7b and 12b
abhi24
changed discussion title from
Response time comparison among Dolly v2 3b, 7b and 12b
to Comparison among Dolly v2 3b, 7b and 12b
I can tell you that on an A10, generation takes maybe 2-5 seconds for the 3B model, 5-15 sec for the 7B model, and in 8bit the 12B model takes about 15-40 seconds. It really varies depending on the generation settings and how long the response ends up being. (I'd try an A100 but I can't get one at the moment!)
For real-time use, you'd be doing some more work than just loading an HF pipeline. Multiple GPUs, FastTokenizer, etc.
I don't think there is necessarily a strong relationship there, but I'm not an expert. I would use as much as you've got!
abhi24
changed discussion status to
closed