allura-org/MoE-Girl_400MA_1BT
hai ! back with another club banger of a miniature moe model for phones n stuff
https://huggingface.com/allura-org/MoE-Girl_400MA_1BT
The last time somebody requested something for his phone or so was a 1700B model.
Uhm.. :) It's queued and should be done in no time :)
Unfortunately, the model is broken (contains nans):
nan detected in blk.0.attn_output.weight
The static quants have been generated, but will be of limited value.
hmm... that's odd; vllm and assorted inference software work fine, i'll have to look into it
thx for trying though! <3
it's possible that the damage stays contained to only a part of the model, but a nan means that any computation based on that weight will result in more nans, trickling through the model. a nan is basically an error value. most inferencing frameworks do not check, including llama when inferencing, so it is possible that the static quants work, to some extent, unless model verification is enabled, at which it will refuse to load.
Yes, that's the llama name for its tensor. I have no clue how it maps those, let me grep around...
blk.{bid}.attn_output "model.layers.{bid}.self_attn.o_proj", # llama-hf nemotron olmoe
Yeah, pretty much looks like it.