Existing on-prem scale-out for mere mortals...?
#1
by
Icecream102
- opened
What's the current state/recommendations for fine-tuning a medium-sized model, if you do not have a datacenter full of A100s, but rather a heterogeneous set of machines with various NVIDIA GPUs, like GeForce 1k to 4k series (like 1080ti, 2080, 3090, 4090) and/or less costly older Teslas (like e.g. M40, P40, etc).
The same question for inference, for large models
(Sorry if there is a better place for this question!)
The new QLoRA system allows fine tuning a 65B model in 48GB VRAM, or a 30/33B model in 24GB VRAM. That's the best option for resource-efficient training right now
Here's a couple of videos on it:
https://youtu.be/8vmWGX1nfNM
https://youtu.be/fYyZiRi6yNE