Model upload incomplete?
Noticed that when converting to GGUF we end up with a missing output_weight, only for the 2.4B model
Also noticed that the other two models have a final safetensors file containing lm_head.weight
where this one doesn't, so wondering if it somehow got missed. Your GGUF files seem to have the proper output_weight so must have been made with the proper original file
"tie_word_embeddings" is true! The output_weight is just the embedding matrix....
Strange then that the produced GGUF I get is different from the one they posted π€
Hello, thank you for your question.
We basically use tie_word_embeddings
to be True on 2.4B models, for memory efficiency on inference stage.
However, we found that conversion steps for AWQ and GGUF do not fit to the model with tie_word_embeddings
, so we need to convert "Tied" model into "Un-tied" model before quantization.
When you try making another GGUF model on yourself, please copy the model.transformer.wte
to model.lm_head
and save it for quantization/conversion.
I'll look into how to do that later today and get back to you
Do you have any reference material on how to do this? From quick searching online I can't find any kind of reference to converting a "tied" model to an "un-tied" model..
Load w/ huggingface transformers, and copy the tensor there?
It's a bit unclear to me too. But, I made this tiny script and it worked, make sure to replace the directories:
import torch
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained('/model/exaone_folder/')
model.lm_head.weight = torch.nn.Parameter(model.transformer.wte.weight.clone().detach())
print(model.lm_head.weight.requires_grad)
model.tie_word_embeddings = False
model.save_pretrained('/model/exaone_folder/')