Consistent textures
Hey!
Thanks for the amazing work.
So I've been trying to get the model to generate consistent texture maps but it seems to not be working well. I've integrated different fine-tuned canny controlnet models but the results are subpar compared to the croissant in the homepage.
Could you please share the codebase you used to generate those or the general methodology and the checkpoints used? It'll be a great help.
Thanks!
Regarding the croissant, the images were generated using an ancient version of the SeargeSDXL workflow in ComfyUI. 1024x1024 resolution, 50 steps @ CFG 8.0 using DPM++ 2M Karras. The SDXL model was the standard sdxl-base 1.0. The rank 256 canny control lora was used for the controlnet, using the albedo map image (which was generated first) as the input processed with a low threshold of 0.0 and high threshold of 1.0, with a strength of 1.0. As for the prompts, I used only the "main prompt" node, having the operating mode set to "main and neg. only", with said prompts being the ones in the demonstration images.
All of these values should be quite flexible though. As long as the preprocessed texture has enough edges for the model to make sense of it, results should be roughly consistent.
Back when I still worked on this thing, I created a small webui that automatically generates all of the textures using one input prompt. I have uploaded that...abortion of a codebase to github (https://github.com/dog-god-rus/sdxl-texture-synthesis), in case you want to see how the images were labeled and use the "intended" method of generating new ones. I sure hope it still works!
Additionally, I plan on making a FLUX.1/SD 3.5 version of this LoRA in the future, as SDXL is quite outdated at this point.
Thank you so much for taking the time to give a detailed response! I'll try some of your configurations and see how it goes.
Looking forward to the your flux checkpoints, although I'm not sure if I can even run em since they are so huge.
Quick question, do you think the 2d checkpoint over here would be able to generate consistent texturemaps at a greater quality or do you think you might get better results by having a separate model for each texture map? Of course the complexity would rise but would the tradeoff be worth it?
Idk, maybe that's something you might need to consider when working on the flux project.
That was basically the whole point of the thesis. Combining all of the texture maps into one models increases consistency at the cost of extra training time and a more complex latent space. Separating them does lead to better performance on a case-by-case basis, but the models end up having completely different ideas for what "a texture of X" should look like, since the individual models' training paths generally diverge.
(example from the thesis, individual models' results with the same seed and text description)
To glue the whole thing together I used a dual-token encoding schema to separately teach the model the concept of a texture map. (Although the gains are basically within rounding error so in retrospect it wasn't very necessary at all.)
Might be an interesting idea to look into synchronizing the training paths though.