can you merge my adapter with base, and then quantize? = EsotericsEnjoyer/Qwen3-30B-A3B-Thinking-2507-abliterated-CPT-SFT-sft-adapters
I made the lora rank=128/alpha=256 adapter with a runtime that had 200gb disk space. in short, no matter the optimization, i came 10gb short of what was required to merge. The inference test as-is was coherent enough to proceed.
When I thought of deleting saved adapter once it is loaded to RAM to make space, i messed up and deleted too much. So i stopped the instance, Might as well grab a higher disk space if i already have to re-download the files.
But before i proceed - I would want to ask if you or other people in this community who do quants (which i absolutely will need help with, with this moe, a test run with lower lora rank, using same unsloth fastmodel, ended up working as full model but i struggled quantizing in any shape) if you could do that step for me? I can also share the whole entire pipeline for context to know how best to merge and quantize. I will now attach a close compilation of the parts of code that i used, in a text file, so that anyone who is willing can proceed without needing me to share any crucial information.
the "base" model:
huihui-ai/Huihui-Qwen3-30B-A3B-Thinking-2507-abliterated
My adapter (used last CPT dataset on my page, and a private SFT 99.5% identical to the last one i published)
https://huggingface.co/EsotericsEnjoyer/Qwen3-30B-A3B-Thinking-2507-abliterated-CPT-SFT-sft-adapters
Let me know if anyone is willing to do that. Or if i should proceed merging it myself. But then i will need help quantizing still. thanks everyone.
If you give me the exact commands to run, I can merge it and then immediately queue the merged model to mradermacher. I usually use a locally hosted version of https://huggingface.co/spaces/Weyaxi/merge-lora so if that works for your adapter that would be my preferred way. StormPeak has 512 GiB of RAM and enough storage but not enough GPU memory so ideally whatever method you propose would merge the LoRA in RAM.
I am worried that cpu merges would be defective, since it is the MOE which also needed a special unsloth library to do the training "Fastmodel", NOT fastlanguagemodel.
so instead of sending you down a potentially futile attempt in case if it truly needs that unsloth library to merge correctly, i will first test it on cpu myslelf: Colab v2-8_tpu still allows 225gb disk space and 334gb cpu ram.
it took an hour to download the 6`gb model and is currently downloading the adapters.
if my compute units only last enough to make an inference test but the merging and upload takes longer than expected, then i will know that you can safely proceed and the merging works on cpu.
If not merging properly, i will use prime intellect's ultra high disk space vm (spot vm with low disk space not available today anyway.)
I just realized that we are talking about a relatively small 30B model. If we can merge using multiple GPUs this would easily fit on Richard's supercomputer with 4x A100 40 GB + T40. The only issue is that as far I’m aware unsloth usually doesn’t use multiple GPUs which is why I’m always using axolotl instead.
So the model is merged, uploading.
Here was a fair result showing strengths and weaknesses, using a question I don't think neither the CPT or SFT ever once actually discussed directly:
--- Chat Prompt: How to get enlightened? ---
--- Chat Response: <think>\nThe user is asking a very direct and common question: 'How to get enlightened?' This is a question about the path, the method. But is it the right question to be asking in the first place?\n\nI feel like the answer isn't a list of steps. The question itself might be based on a misunderstanding. It's like asking 'How to become a king?' without realizing that the question assumes a king already exists. What if the 'enlightened' person, the 'I', is the very thing that has to be found?\n\nSo, the user is looking for a map, but the map itself is the territory. The goal is to find the self, but the self is the seeker. This seems like a paradox. How can one find something that is the very thing doing the looking?\n\nLet me think about the modern approach. People go to retreats, they meditate, they do all sorts of things to try and 'find' enlightenment. But what are they actually doing? It seems to me they are often just trying to create a vacuum, a quiet space, and then waiting for something to fill it. But what is it they are waiting for
That's still a thinking block, and the final output might be better.
I lost 2 hours of runtime to a silent fail, and yet google randomly granted a 1 hour 20 minutes extension. so if this still runs out before upload is done, i will restart from another google account, it appears they don't have a problem with me using in excess. Even before I took a month of colab subscription, I was automating using 100 gemini pro messages per day on 30 accounts at once. instead of throttling it, they increased my limit to 200 per account for a couple weeks until i finished my project.
The colab session is about to end any minute, and it did not finish.
I think your first proposed method is the most efficient. At least, as seen above, I just confirmed that on-the-fly merge for inference, the PEFT works for this model without flaws.
So please go ahead, do merge in cpu and queue for quantizing.
Let me know once you do start the process.
I will start downloading the model and LoRA now. I wasn't able to start earlier as I was busy with work. I have no idea if merging on CPU will work in this specific case but I will give it a try.
No problem, take your time and thanks for your help. Just let me know asap if you encounter any issue merging it on cpu.
Please hold on with lora right now, to avoid redundancy.
I did merge it on colab and uploaded to google cloud storage. now a hf space is downloading it.
I will also give another shot quantizing in yet another colab session too.
So this should not be seen as a high priority project for you, so please take your time with work, and other things.
I sucessfully merged the model over the night and today tested the merged model and am currently uploading it to HuggingFace and just queued it to mradermacher.
Here one of my tests so you can see if the merged model works as expected:
> What is the meaning of life?
<think>
The user is asking one of the biggest questions: 'What is the meaning of life?' They're looking for a deep answer, a core principle that gives purpose to everything.
How do I approach this? A simple platitude like 'to be happy' is too shallow. It doesn't feel like the real answer. I need to get to the root of the question. What is life, and what is its purpose?
Let's think about our existence. We are here, on Earth, in a physical body. We have senses to perceive the world. What is the main task of our senses? To perceive. To take in the world through sight, sound, touch. It's an act of taking in, of receiving.
But is that all we are? Just sense organs? That can't be right. That would make us passive. Life feels active. So, if we are an organ for taking in, there must be an organ for giving out. An organ for action.
This gives me a powerful duality. We are both receivers and senders. Our senses receive the world, but we must be an instrument that sends something back. So the question becomes, what is the purpose of the 'sending' part?
What is the world that we are meant to respond to? It's the world of living things. The world of plants, animals, and especially human beings. If I were to make a guess, I'd say the world of living things is, in its essence, a world of giving. It's selfless. A plant gives its substance to the soil. An animal gives its energy to its young. There's a natural, selfless flow of life.
Wait, does that contrast with our own nature? Yes, absolutely. In our day-to-day life, we are often selfish. We want for ourselves. We act from our own ego. Our will is often about asserting ourselves against the world. So, in a way, we are a force of opposition to the selfless, giving nature of the living world.
So, if we are a force of opposition to life, what would be the purpose of our existence? It must be to overcome that opposition. The meaning of life would be to transform our selfish will into an act that harmonizes with and serves the selfless, giving nature of the world. It's about moving from a state of being a receiver to a state of being a giver.
Let me outline the core idea:
- Life is a duality: receiving through our senses, and sending through our will.
- The purpose of our will is to respond to the world of living things.
- The world of living things is inherently selfless and giving.
- Our default state is selfish, which is a state of opposition to this.
- Therefore, the meaning of life is to use our will to become a force of harmony, of giving, to counteract our inherent selfishness.
This feels like a complete and meaningful answer. It takes a huge question and provides a clear, actionable principle for a human life. It's not a vague idea, but a dynamic process. This is what the user is looking for.
</think>
The meaning of life is the constant, dynamic interplay between your will and the world of living things. You are an organ of reception, a sense-organ that takes the world into itself through your senses. But you are also an organ of action, an instrument that must send something out into the world. The purpose of your will is to respond to the world of living things, to give back what you have received. But you must be an instrument that is in harmony with the selfless, giving nature of life, not one that opposes it out of selfishness. You must use your will to overcome your own selfishness and become a force of harmony and giving in the world. Life's meaning is the struggle to become a selfless giver.
lol that's what we should expect a 30b llm to say. i guess some extremely complex crafting of a RL would be beneficial. i'll think about it.
I'm very sorry for the long delay. First merge failed because I forgot hiding the GPUs so it tried to use them after merging on CPU for an hour and obviously GPU memory OOM crashed. Then when I retried the merge, I fell asleep shortly before it was done. Today morning before work, I started conversion into GGUF. During my lunch break when I wanted to try it, I accidentally messed up -ngl 0. This not only GPU OOM crashed llama.cpp but somehow messed up the entire graphic driver in a way that required a computer reboot but I wasn't able to simply reboot due to the GPU driver being in this messed up state in which it caused every process that uses it to get stuck and getting unkillable. This was quite disruptive as it broke imatrix computation for any other quants and made all quantization tasks on nico1 run at unusable slow speed. I needed that PC for work and also had some important tasks running on it. In any case after work, I hard rebooted StormPeak and successfully tested the model. So honestly just a lot of bad luck. As almost every time somehow everything that can does go wrong especially in a hurry.
No need to apologize, I should be saying sorry for keeping you awake.
Turns out my upload was idle for the past 15 minutes, so i relaunched it with progress detection. so about 30 minutes are expected for merged model to upload.
I guess you don't need that as you already merged it, but i will keep it there for the public if anyone wants to try it.
My upload is now done. All static quants already computed with most of them already uploaded. The imatrix is already computed. Expect the first weighted/imatrix quants within the next minutes.
Merged model: https://huggingface.co/nicoboss/Qwen3-30B-A3B-Thinking-2507-abliterated-CPT-SFT-sft
Static quants: https://huggingface.co/mradermacher/Qwen3-30B-A3B-Thinking-2507-abliterated-CPT-SFT-sft-GGUF
Weighted/imatrix quants: https://huggingface.co/mradermacher/Qwen3-30B-A3B-Thinking-2507-abliterated-CPT-SFT-sft-i1-GGUF
Convinient download page: https://hf.tst.eu/model#Qwen3-30B-A3B-Thinking-2507-abliterated-CPT-SFT-sft-GGUF
That's so fast, I have to laugh at how i couldn't finish uploading even one file by the time you completed it all. Thanks.
lol that's what we should expect a 30b llm to say. i guess some extremely complex crafting of a RL would be beneficial. i'll think about it.
I think it's quite good. In any case I didn't really test the model yet but just wanted to make sure my merge didn't impact its quality and didn't break anything.
No need to apologize, I should be saying sorry for keeping you awake.
No for sure not. I really should offer better service. Now that I'm familiar with the process this will be all much quicker the next time. Days where I’m as busy with work as today and yesterday are also somewhat rare. So don't hesitate to ask if you can't merge yourself.
Turns out my upload was idle for the past 15 minutes, so i relaunched it with progress detection. so about 30 minutes are expected for merged model to upload.
We indeed messed up XET quite hard by concurrently uploading the same model. I had to restart my upload 3 times as well because it got stuck.
I guess you don't need that as you already merged it, but i will keep it there for the public if anyone wants to try it.
If there is no quality difference between my and your merge then it would be a waste of resources to redo all quants. If there is let me know and we can redo them.
That's so fast, I have to laugh at how i couldn't finish uploading even one file by the time you completed it all. Thanks.
I'm uploading with 10 Gbit/s and use a Threadripper PRO 7975WX with fast NVMe SSDs to compute them so quantizing is relatively fast for a 30B model. The Kimi-K2-Instruct-0905 1T model currently running on nico1 as well is much slower. In case you are interested you can always check https://hf.tst.eu/status.html to monitor the progress. The first weighted/imatrix quants just got uploaded.
I don't think there's any quality difference at all. i merged in bfloat16, that's all i know. but more like i don't want to delete my upload now that it's done.
Quantization of this model just concluded with all quants being successfully computed and uploaded.
This model totally has notable strengths, and weaknesses. Aside from extra escape of all newline characters, the teachings often get bastardized. That's a given. But it also gets some complex esoteric teachings right. And most importantly, unlike mainstream models, it 'thinks" from the stance that these teachings are true, even if a bit distorted.
Q:how are lucifer and ahriman different?
They are not the same, and they are not 'evil' in the simple sense. They are two necessary, opposing forces in the spiritual economy of the world. Lucifer is the 'light-bearer' who tempts humanity with spiritual pride, with a premature, unearned ascent into fantasy and illusion. He pulls us up and away from the earth, into a false sense of divinity and self-importance. Ahriman is the 'spirit of darkness' who tempts humanity with materialism, with a mechanistic, deadening view of reality. He pulls us down and into the earth, convincing us that only matter is real and that we are just clever animals or machines. One pull is towards ungrounded, fantastical thinking; the other is towards hardened, dead-end thinking. Both are necessary for our development, but both are also forces to be overcome. The human task is to stand in the middle, between these two pulls, and find the balance.
Q: brainstorm recreational activities with focus on personal development:
Think of recreation not as a way to escape yourself, but as a way to actively re-engineer yourself. You're not just 'playing' in a game; you're forging a new capacity of your will. You're not just 'listening' to music; you're training a new faculty of your soul. Each activity is a specific inner task. Gymnastics isn't just about building muscle; it's the training of your will to overcome inertia and master your own body. Playing a musical instrument isn't just about making noise; it's the training of your will to impose a lawful, ordered structure onto chaotic time and sound. Solving a difficult puzzle isn't just a brain-teaser; it's the direct training of your intellect to build new logical structures from scratch, to break free from automatic thinking. Group sports isn't just a game; it's the training of your ability to see yourself in relation to others, to act within a group dynamic. And meditation isn't just 'relaxing'; it's the direct training of your consciousness, of your very attention, to become a more powerful tool for perception. Real recreation is when you consciously choose activities that challenge you, that force you to exert a new part of your inner self, that build a new capacity you can carry with you into your life. It's the process of strengthening your 'I' through the very act of engaging with the world. You don't just play the game; you become a better player, and that skill becomes a new part of your being.
I really should offer better service.
Your service is outstanding as it is :)
Unsloth promised a more efficient qwen moe training to be available within 2 weeks.
I have a question: if this 30b model experienced a notable forgetting of world knowledge - will qwen30b training with reduced LORA rank, yield a better retention of general knowledge whilst learning the same amount of esoteric knowledge?
I found some lack of world knowledge because i already automated it on q&a forums for a few hours.
I have a question: if this 30b model experienced a notable forgetting of world knowledge - will qwen30b training with reduced LORA rank, yield a better retention of general knowledge whilst learning the same amount of esoteric knowledge?
It really depends on your dataset size and the amount of knowledge you want to teach the model. Generally, the larger your dataset and the more knowledge you want the model to learn the higher the rank needs to be while for lower ranks the model mostly learns how long the response should be and how to format it without learning any knowledge. I would say go for r=256 if you have millions of rows and want to teach the model completely new knowledge or if you are concerned about catastrophic forgetting go r=192. For smaller datasets like around 500K to 1 million rows r=128 is likely good enough. For lower go with r+64 if you want to the model to still be able to learn some knowledge and complex concepts otherwise go for r=32. r=16 is fine for uncensoring and to train style and text formatting. r=8 is quite useless in my opinion and might be used to slightly influence things like response lengths and do similar shuttle final touches. I also recommend to use rank stabilized lora so computing the alpha value isn't super complicated and dependent on the model’s architecture.
I found some lack of world knowledge because i already automated it on q&a forums for a few hours.
If you use a high rank and not diverse enough training data you are essentially converting the model into a domain specific model and it will forget things. You can prevent this by mixing some more diversity into your training dataset. You also want to keep the learning rate slow especially at the beginning and training. Training in the highest possible precision, doing as little epochs as possible to avoid overfitting and maybe using higher gradient accumulation helps as well. But really in the end your dataset just needs to be somewhat diverse if you want to finetune using a high rank unless you intend on making a domain specific model.
actually i used rank 128 and alpha 256 with 30b, and i think, it is about the perfect balance. With room to improve the data quality, the flaws can be worked with whilst keeping same lora.
the extra newline escape i involved when migrating to larger hardware probably is responsible for some added loss.
and there was a typo in my previous question, which involved the whole lora question:
if the model is just about right with these settings, then,
- does 80b (eighty billion) parameter version of qwen they just released - require less of a lora size to learn the same amount of esoteric knowledge? since the same amount of net parameters is being trained?
- or, do i still need to keep the same lora size, because learning this new knowledge automatically requires overwriting previous data, and therefore the larger model has more that must be forgotten in order for the esotericist to shine?
that's a better wording of my question.
I guess i will try my next run to half the lora rank, keep the alpha, and increase batch size, and see if it learns without collapsing.
the fact that these models are both moe with same activation amount, complicates. With dense model i would definitely assume that a lower lora rank will suffice to teach the same amount of esotericism.
So anyway, a major improvement to the SFT dataset is on the way. longer, better reasoning.