Update safetensors to have embedding layer

by mayankagarwals - opened Nov 20, 2024

base: refs/heads/main

←

from: refs/pr/7

Discussion Files changed

-2

mayankagarwals

Nov 20, 2024

•

edited Nov 20, 2024

Fixes https://github.com/huggingface/transformers/issues/34759

Proposed solution :
The safetensors file had the embedding layer missing.
I loaded the model from the existing weights file and saved it as safetensors

You can test the functionality of the updated safetensors with the following script

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("facebook/MobileLLM-125M", use_fast=False)

mobilellm_old = AutoModelForCausalLM.from_pretrained("facebook/MobileLLM-125M",trust_remote_code=True, use_safetensors=True)
mobilellm = AutoModelForCausalLM.from_pretrained("/Users/mayankagarwal/Documents/OSS/codebases/MobileLLM-125M",trust_remote_code=True, use_safetensors=True)

input = tokenizer("Hello word!", return_tensors="pt")


output_old = mobilellm_old.generate(**input)
decoded = tokenizer.decode(output_old[0], skip_special_tokens=True)
print("Old decoded output:", decoded)


output = mobilellm.generate(**input)
decoded = tokenizer.decode(output[0], skip_special_tokens=True)
print("Updated decoded output:", decoded)

Here's a screenshot of the output

Update safetensorscc6c1d05

mayankagarwals changed pull request status to open Nov 20, 2024

mayankagarwals

Nov 28, 2024

@zechunliu Please do take a look!

zechunliu

AI at Meta org 27 days ago

Thank you so much for raising this issue! It's a pity I just noticed this. I have removed the safetensors, and it should work now. The original pytorch_model.bin is correct. Let me know if you spot any other issues!

zechunliu changed pull request status to closed 27 days ago

alexedw

27 days ago

Hey @zechunliu , the safe tensors file actually did contain the embedding params, they were just named lm_head. To my knowledge safe tensors format doesn't like weight-sharing, so the base model which references this matrix as both lm_head and embed_tokens was arbitrarily choosing to drop embed_tokens.

Would love to have the original safetensors variant still available, either with a small code change in model loading to re-tie embeddings on loading, or just renaming the safetensors model to something like model_no_embed.safetensors, I have some training pipelines that rely on it and am very grateful for you releasing these weights in the first place!

(on a slightly unrelated note - are these models trained in BF16 or FP16? It seems these tensors are in fp16, while the larger models are in bf16. I can't quite tell from your GitHub repo nor paper which was intended either!)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment