Questions

#1
by GhostGate - opened

Hi!

I am trying to ascertain the exact role of this model. Was this done in order to decrease the the size of Largestral and provide a better alternative, whole keeping the "smarts" so to say? I am curious what are the actual effects of pruning it and what has been achieved doing so (benefit wise in terms creativity, logic, reasoning, coherence, etc.).

I can't speak for the author, but for me anyways if the intelligence is on par with the orig Mistral-Large-2407 then this smaller pruned size version would provide a faster prompt ingestion speed; thus, allowing me to increase context size.

what are the actual effects of pruning

Possibly more stupidity in some areas

what has been achieved doing so

model takes less VRAM, so more people will have ability to run it.

Yep, makes sense to me. I have read a bit more in detail and it answered my questions. Either way, I think it's good if we can get less VRAM at the cost of some layers which provide little value for the purpose of the model. Let's see later how it holds up in more testing.

GhostGate changed discussion status to closed
TheDrummer changed discussion status to open

Sign up or log in to comment