Qwen 3.5 4B TE with Anima 2B - WORKING NOW

#75

by lylogummy - opened 4 days ago

I built a ComfyUI custom node to support the Qwen 3.5 4B hybrid (Mamba2 + Attention) text encoder from nightknocker/cosmos-qwen3.5 with Anima 2B.

The node loads, runs, and generates images — no errors — but the results are consistently worse than the base Qwen 3 0.6B text encoder that ships with Anima.

Screenshots

Qwen 3.5 4B	Original TE (0.6B)

The Qwen 3.5 4B model is a hybrid SSM/attention architecture (not a standard transformer), so it required a full custom implementation rather than a simple config swap.

The SSM recurrence implementation may behave differently at inference compared to the original training framework (e.g. chunked parallel scan vs. sequential). Subtle numerical differences could accumulate across 24 SSM layers.

Custom node: https://github.com/GumGum10/comfyui-qwen35-anima

If anyone has ideas on why the larger encoder underperforms, or knows whether the LLM adapter needs retraining for different text encoders, I'd appreciate the input.

kdutt2000

4 days ago

•

edited 4 days ago

You could try the 2b variant I don't know if that will improve. I actually tried the Goekdeniz-Guelmez/Josiefied-Qwen3-0.6B-abliterated-v1 as the text Encoder for Anima to see if it makes a difference but similar to z Imege turbo with it's different uncensored, thinking and instruct variants of the Qwen3 4b it doesn't work properly compared to the default text encoder the model was trained on. I think your best bet would be using the default trained text encoder the model comes with and maybe a prompt enhancer node (with a local or LLM model via OpenRouter) and or experimenting with different prompts (natural language and Dan tags...) as it works surprisingly well especially with solid negative prompts, quality tags and weighted tags if needed try not to use weighted tags too much of these in a single prompt as it can confuse the model a bit but it does work on certain parts of the prompt you want to be more prominent and visible.

Also check out a great stabilizer/quality LORA called RDBT - Anima: as it very helps with prompt adherence and Checkpoints like AnimaYume that further enhance the model with custom data sets from creators with experience with fine-tuning model from the Pony, illustrious and NoobAI family...

Great work! On a custom note though, I've tried building my own and didn't really work well for a different model last year So I know how hard it is and how much time and effort it takes to build one even with the help of Gemini and ChatGPT as I used. I hope you manage to get it working and it definitely seems interesting and the more experimenting the better especially for new models. I will definitely give it a try later.

InvictusCreations

3 days ago

If anyone has ideas on why the larger encoder underperforms, or knows whether the LLM adapter needs retraining for different text encoders, I'd appreciate the input.

It's because the model is not trained against Qwen3.5 - During training the diffuser learns what kind of embedding "geometry" to expect from the Text Encoder, with a completely different model as the TE, none of the embeddings or hidden states match up at inference to what the diffuser is trained with.

You could try the 2b variant I don't know if that will improve. I actually tried the Goekdeniz-Guelmez/Josiefied-Qwen3-0.6B-abliterated-v1 as the text Encoder for Anima to see if it makes a difference but similar to z Imege turbo with it's different uncensored, thinking and instruct variants of the Qwen3 4b it doesn't work properly compared to the default text encoder the model was trained on.

The text encoder that comes with Anima is already uncensored. (or rather: not yet censored)

lylogummy

2 days ago

Hi, thanks for the feedback! I've reiterated quite a bit and it's working nicely with Anima now, please see below:
https://huggingface.co/lylogummy/anima2b-qwen-3.5-4b
https://civitai.com/models/2455272?modelVersionId=2760745
https://github.com/GumGum10/comfyui-qwen35-anima

lylogummy changed discussion title from Qwen 3.5 4B TE with Anima 2B - working but underperforming vs base 0.6B to Qwen 3.5 4B TE with Anima 2B - WORKING NOW 2 days ago

kdutt2000

2 days ago

Hi, thanks for the feedback! I've reiterated quite a bit and it's working nicely with Anima now, please see below:
https://huggingface.co/lylogummy/anima2b-qwen-3.5-4b
https://civitai.com/models/2455272?modelVersionId=2760745
https://github.com/GumGum10/comfyui-qwen35-anima

Honestly man well done that's look's much better However, I'm curious, is the difference and quality similar to the current 0.6b text encoder. I'll definitely try this out. Have you ever considered working on a LLM adaptor for illustratios?

InvictusCreations

1 day ago

I don't get it, why are you comparing one mismatched TE to another mismatched TE? Qwen3_0.6B =/= Qwen3_0.6B-Base. Anima requires BASE, which is the NOT post-trained, unaligned base model of Qwen3_0.6B. Improvements over Qwen3_0.6B are not relevant as it's not the correct TE, it would be more interesting, and more importantly relevant, if you provide comparisons to Qwen3_0.6B-Base. (Unless it's just a labeling issue in which case shame on you but I retract everything 🥳 , I don't have time to test it myself anytime soon so I'm taking your results at face value)

lylogummy

1 day ago

Hi, thanks for the feedback! I've reiterated quite a bit and it's working nicely with Anima now, please see below:
https://huggingface.co/lylogummy/anima2b-qwen-3.5-4b
https://civitai.com/models/2455272?modelVersionId=2760745
https://github.com/GumGum10/comfyui-qwen35-anima

Honestly man well done that's look's much better However, I'm curious, is the difference and quality similar to the current 0.6b text encoder. I'll definitely try this out. Have you ever considered working on a LLM adaptor for illustratios?

Hi! Thank you so much! It's similar to 0m6B, not better and certainly worse at NL. Still working the quirks out sand I'm seeing room for improvement. For illustrious I'm not sure if its worth it as its been trained with clip rather than an llm, so at best an llm would translate natural text to tags

lylogummy

1 day ago

I don't get it, why are you comparing one mismatched TE to another mismatched TE? Qwen3_0.6B =/= Qwen3_0.6B-Base. Anima requires BASE, which is the NOT post-trained, unaligned base model of Qwen3_0.6B. Improvements over Qwen3_0.6B are not relevant as it's not the correct TE, it would be more interesting, and more importantly relevant, if you provide comparisons to Qwen3_0.6B-Base. (Unless it's just a labeling issue in which case shame on you but I retract everything 🥳 , I don't have time to test it myself anytime soon so I'm taking your results at face value)

Hi! Thanks for pointing that out...its a labeling issue, will update the docs to reflect the correct name

InvictusCreations

1 day ago

Cool, glad to see it. In the same vein, is there a reason why you yourself used Qwen3.5_4b, over Qwen3.5_4b-base? Does the conversational post-train, alignment and guardrails not make the job of adapting it to work with Anima all that much more difficult for probably negative net benefits? Just curious.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment