Is the 8-bit gptq of 8b base model available?

by AshTaurus - opened Apr 21

Discussion

AshTaurus

Apr 21

I actually needed the base model for my use case. It will be very helpful if you can upload that. Thanks.

davidxmle

Astronomer org Apr 21

Hey, we are delaying the release of the base non-instruct model quants due to an under investigation bug in llama 3.
See the link here: https://twitter.com/danielhanchen/status/1781395882925343058.
There are some tokens in the base model that are under trained which terrible training results.

I think the solution has been found so we may release the 2 models very soon either today or tomorrow.

Are you looking to fine-tune on the base?

davidxmle

Astronomer org Apr 22

@AshTaurus Here it is: https://huggingface.co/astronomer-io/Llama-3-8B-GPTQ-8-Bit. If you are doing instruct fine tuning please read the top of the read me file. I may either release a script in the folder or release a patched version of the model with the average value of all the embedding dimensions (vector components) for special tokens initialized to the mean so you don't get exploding gradients or NaN gradients during training

AshTaurus

Apr 22

Thanks ❤️

AshTaurus changed discussion status to closed Apr 22

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment