Existing on-prem scale-out for mere mortals...?

by Icecream102 - opened Jun 9, 2023

Discussion

Icecream102

Jun 9, 2023

•

edited Jun 9, 2023

What's the current state/recommendations for fine-tuning a medium-sized model, if you do not have a datacenter full of A100s, but rather a heterogeneous set of machines with various NVIDIA GPUs, like GeForce 1k to 4k series (like 1080ti, 2080, 3090, 4090) and/or less costly older Teslas (like e.g. M40, P40, etc).
The same question for inference, for large models

(Sorry if there is a better place for this question!)

TheBloke

Owner Jun 16, 2023

•

edited Jun 16, 2023

The new QLoRA system allows fine tuning a 65B model in 48GB VRAM, or a 30/33B model in 24GB VRAM. That's the best option for resource-efficient training right now

Here's a couple of videos on it:
https://youtu.be/8vmWGX1nfNM
https://youtu.be/fYyZiRi6yNE

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment