Questions

by GhostGate - opened Dec 2, 2024

Dec 2, 2024

Hi!

I am trying to ascertain the exact role of this model. Was this done in order to decrease the the size of Largestral and provide a better alternative, whole keeping the "smarts" so to say? I am curious what are the actual effects of pruning it and what has been achieved doing so (benefit wise in terms creativity, logic, reasoning, coherence, etc.).

ReMeDy-TV

Dec 4, 2024

•

edited Dec 4, 2024

I can't speak for the author, but for me anyways if the intelligence is on par with the orig Mistral-Large-2407 then this smaller pruned size version would provide a faster prompt ingestion speed; thus, allowing me to increase context size.

Ainonake

Dec 4, 2024

what are the actual effects of pruning

Possibly more stupidity in some areas

what has been achieved doing so

model takes less VRAM, so more people will have ability to run it.

GhostGate

Dec 5, 2024

Yep, makes sense to me. I have read a bit more in detail and it answered my questions. Either way, I think it's good if we can get less VRAM at the cost of some layers which provide little value for the purpose of the model. Let's see later how it holds up in more testing.

GhostGate changed discussion status to closed Dec 5, 2024

TheDrummer changed discussion status to open Dec 5, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment