Text Generation
Transformers
PyTorch
English
llama
custom_code
text-generation-inference
Inference Endpoints

MLP weights in each layer seem to be different

#4
by Jeethu - opened

Thanks for releasing the MobiLlama models! IIUC, the paper states that the FFN blocks are shared across all transformer blocks. I've verified that this in indeed the case with the 0.5B model and the MLP weights are identical in each layer, but it doesn't seem to be the case with this model. Any reason for this discrepancy?

Sign up or log in to comment