Any chance of Qwen Image Edit quants?
Any chance of Qwen Image Edit quants?
There are some from QuantStack, but the quality seems quite low at <= Q3. Maybe they're not using the "new dynamic logic where the first/last layer is kept in high precision" technique.
Edit: I should mention that your set of quants is great and gives really good results even down at Q3.
Closing because I see the metadata has the same structure in the QuantStack quants, so that's probably not the issue.
Yeah, I think those are also created with the latest code, which has the dynamic logic enabled by default for Qwen Image.
Might try and see if it can be improved for the edit model, though (iirc I was seeing something similar with the I2V wan model before, where it was more sensitive to quantization than the T2V so who knows)
for image-edit 2509, did a fist test, have to redo with all models but clean image on Q2_K quant:
// first/last block high precision test
if (arch == LLM_ARCH_QWEN_IMAGE){
if (
(name.find("transformer_blocks.0.") != std::string::npos) ||
(name.find("transformer_blocks.59.") != std::string::npos) // this should be dynamic
) {
if (ftype == LLAMA_FTYPE_MOSTLY_Q2_K ||
ftype == LLAMA_FTYPE_MOSTLY_Q3_K_S ||
ftype == LLAMA_FTYPE_MOSTLY_Q3_K_M ||
ftype == LLAMA_FTYPE_MOSTLY_Q3_K_L ||
ftype == LLAMA_FTYPE_MOSTLY_Q4_0 ||
ftype == LLAMA_FTYPE_MOSTLY_Q4_1 ||
ftype == LLAMA_FTYPE_MOSTLY_Q4_K_S ||
ftype == LLAMA_FTYPE_MOSTLY_Q4_K_M) {
new_type = GGML_TYPE_Q5_K; // Minimum Q5_K for low quants
}
else if (ftype == LLAMA_FTYPE_MOSTLY_Q5_K_M) {
new_type = GGML_TYPE_Q6_K;
}
}
}
did the trick, further testing...
for image-edit 2509, did a fist test, have to redo with all models but clean image on Q2_K quant:
// first/last block high precision test if (arch == LLM_ARCH_QWEN_IMAGE){ if ( (name.find("transformer_blocks.0.") != std::string::npos) || (name.find("transformer_blocks.59.") != std::string::npos) // this should be dynamic ) { if (ftype == LLAMA_FTYPE_MOSTLY_Q2_K || ftype == LLAMA_FTYPE_MOSTLY_Q3_K_S || ftype == LLAMA_FTYPE_MOSTLY_Q3_K_M || ftype == LLAMA_FTYPE_MOSTLY_Q3_K_L || ftype == LLAMA_FTYPE_MOSTLY_Q4_0 || ftype == LLAMA_FTYPE_MOSTLY_Q4_1 || ftype == LLAMA_FTYPE_MOSTLY_Q4_K_S || ftype == LLAMA_FTYPE_MOSTLY_Q4_K_M) { new_type = GGML_TYPE_Q5_K; // Minimum Q5_K for low quants } else if (ftype == LLAMA_FTYPE_MOSTLY_Q5_K_M) { new_type = GGML_TYPE_Q6_K; } } }
did the trick, further testing...
so simply a bit higher with first last block? That would be rather easy, nice!
as i said, testing, but compared to what i normally get is that like c64 vs. vga and its my own gguf from https://huggingface.co/Phr00t/Qwen-Image-Edit-Rapid-AIO so including some loras for 4 + 8 step
further pushing this 28 tensors to q8_0 improves allover quality, since i got bad resullts with 8-step i tried.