Downscaling the `Q_q` and `W_k` matrices for repeated layers in franken-merges
14
#4 opened 7 months ago
by
jukofyork
Guidance on GPU VRAM Split?
5
#3 opened 10 months ago
by
nmitchko
Performance
13
#2 opened 10 months ago
by
KnutJaegersberg