Tiny models

by jedininjaRob - opened Aug 22

Aug 22

Hey man realy in love with your stuff but i gotta say guys like me are GPU poor.. i have an idea that could make allot of happy mediums across the board

models like the Wolvereens...are cool... even the dulling ones.. but A3b models.. just seem to work better...
but theres as cross road for context limits and they get hit alot with model sizes biger then 16B and more them ctx of 10K or more

so swing and pitch ... what if models like this Qwen3-4b had coder variants...as wll as a min ( A"N"B ) activated amounts.. like MOEs normally do... BUT instead of 4 models of coders.. combine different variations like...this modle does planning realy well, and this model dose coding REALY well....

the loading a 16B models inst hard for me i run 64gb ram .. but all on CPU

coders and guys like me need TINY models thats think FAST or not at all "reasoning is serious a drag to watch"
so something like ((Qwen2.5-coder-3x7B-A3B-MAX coder-imatrix-gguf types with X20 brainstorm or something))

small plenty of headroom and tones of talent in the coding field .. but i looked you got ALOT of variation .. but none tuned to the task .. why n just kill the wights not responsible for TALKING thinking and coding...alot of the other stuff in the models are just useless benchmark shit that serves no purpose...

would love to here your thoughts and possibly collab cause well im working in AGI and this area of training and coding LLMS isnt realy my forte

thank you in advance man ... appreciate it

DavidAU

Owner Aug 23

@jedininjaRob

I hear you; at the moment making "MOEs" of smaller Qwens is not possible due to Mergekit updates (awaiting them...)
However these may be of interest to you- Coders from 0.8B to 12B in 3 collections:

https://huggingface.co/collections/DavidAU/coders-and-programmers-08b-24b-34b-688f1c978851ab36ead6f6b7

https://huggingface.co/collections/DavidAU/coders-and-programmers-4b-models-with-brainstorm-20x-6b-688f1b397f3a2ffb714184f1

https://huggingface.co/collections/DavidAU/coders-and-programmers-11b-and-12b-fused-models-688f1bddff5e123df452c711

You may also want to see this (as Jan V1 is a coder too, and a very good one):

https://huggingface.co/DavidAU/Qwen3-Jan-v1-256k-ctx-6B-Brainstorm20x

jedininjaRob

Aug 29

....what about narrowing the weights with removing non active nodes to get just the weight used for coding and English and dev...
then brain storm the hell out of it.. and a light post-train???

run a lora type pass, with next to 0 for the updates just to see whats gets activated and not.. weight by weight for a data set of python and dev work
invert and remove???

easily just be a small 3b model instruct version right???

am i barking up the wring tree here???

thanks again dude

DavidAU

Owner Aug 29

It is a lot easier to train / make domain specific , then remove (reverse training?) as noted.

There might be a way to do this -; might want to goggle it.
It could create a real mess / damage the model.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment