Does the model actually work?
Does it even inference correctly?
I believe that even models meant to be further finetuned after merging like this are effective if they properly inference after the merge like my model here:
https://huggingface.co/Replete-AI/Llama-3-11.5B-Instruct-V2
There are a few post-merge finetunes on my page already from the last few days, I'd suggest using this for new finetunes and using those for inference.
athirdpath/Llama-3-15b-Instruct-GLUED for most purposes, athirdpath/Llama-3-15b-Instruct-GLUED-Plus for eRP
@athirdpath
Those models seem like they are already trained. You are missing the point. What I'm saying is the model would perform much better after training if it can be inferenced immedietly after the passthrough merge without issues. Meaning it retains much of its intelligence and would be a much more successful finetune. The model I linked can fully be used without the need for finetuning even though it went through passthrough, meaning that after finetuning, it would perform much better than even this 15b model despite my model being 11.5b size.
At least that is my working theory
So if you can figure out how to make a 15b parameter passthrough merge that can be properly inferenced without loss after passthrough, then the finetune afterwards would be much more successful.
Ah, I understand.
This does work, but like the early stages of my Iambe models, it has a few oddities. I arraigned the layers such that there's no way it could ever maintain all of its smarts, but that's not my goal with this line.
These 15b models exist to test a theory of mine, that by manipulating mostly the 2/5 and 3/5 of the model, you can adjust how it analyzes narratives and chats.
My 11b model does however work great out of the box, it can even do math. All my tests with expanding L3 the classical way to >11b didn't demonstrate any performance improvement but had higher compute costs.