3.8TB model?!?!
Um. Is this meant to be a joke?
If anyone could even remotely touch this, they wouldn't be getting their models off HF and would be building it themselves. And its a good way to piss off HF and get your account removed, which would be a shame as you do a lot of good things for the community who cant ( or wont ) do conversions on their own.
Or is this really backing training data?
Nobody said you must run the unquantized model. Doing so would be insane. There are already GGUF quants available for FatLlama you can use instead. You don't even need that expensive hardware to do so. 512 GiB of RAM and any NVidia GPU with at least 8 GiB of GPU memory is enough to run i1-IQ2_M. I know because I tried this. Using RPC I can even run this model in IQ4_XS on my home setup (512 GiB + 256 GiB + 128 GiB). I did this to compute the imatrix for the wighted/imatrix quants provided by mradermacher.
Static quants: https://huggingface.co/mradermacher/FATLLAMA-1.7T-Instruct-GGUF
Wighted/imatrix quants: https://huggingface.co/mradermacher/FATLLAMA-1.7T-Instruct-i1-GGUF
Q2? Ill pass.