388 6 2

Furkan Gözükara

MonsterMMORPG

https://www.youtube.com/@SECourses

AI & ML interests

Check out my youtube page SECourses for Stable Diffusion tutorials. They will help you tremendously in every topic

Recent Activity

posted an update 1 day ago

NVIDIA Labs developed SANA model weights and Gradio demo app published —Check out this amazing new Text to Image model by NVIDIA Official repo : https://github.com/NVlabs/Sana 1-Click Windows, RunPod, Massed Compute installers and free Kaggle notebook : https://www.patreon.com/posts/116474081 You can follow instructions on the repository to install and use locally. I tested on my Windows RTX 3060 and 3090 GPUs. I have tested some speeds and VRAM usage too Uses 9.5 GB VRAM but someone reported works good on 8 GB GPUs too Default settings per image speeds as below Free Kaggle Account Notebook on T4 GPU : 15 second RTX 3060 (12 GB) : 9.5 second RTX 3090 : 4 second RTX 4090 : 2 second More info : https://nvlabs.github.io/Sana/ Works great on RunPod and Massed Compute as well (cloud) Sana : Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer About Sana — Taken from official repo We introduce Sana, a text-to-image framework that can efficiently generate images up to 4096 × 4096 resolution. Sana can synthesize high-resolution, high-quality images with strong text-image alignment at a remarkably fast speed, deployable on laptop GPU. Core designs include: Deep compression autoencoder: unlike traditional AEs, which compress images only 8×, we trained an AE that can compress images 32×, effectively reducing the number of latent tokens. Linear DiT: we replace all vanilla attention in DiT with linear attention, which is more efficient at high resolutions without sacrificing quality. Decoder-only text encoder: we replaced T5 with modern decoder-only small LLM as the text encoder and designed complex human instruction with in-context learning to enhance the image-text alignment. Efficient training and sampling: we propose Flow-DPM-Solver to reduce sampling steps, with efficient caption labeling and selection to accelerate convergence.

updated a model 2 days ago

MonsterMMORPG/fixed_sana2

updated a model 2 days ago

MonsterMMORPG/fixed_sana

View all activity

Articles

Full Training Tutorial and Guide and Research For a FLUX Style

Sep 8

• 5

20 New SDXL Fine Tuning Tests and Their Results (Better Workflow Obtained and Published)

Aug 13

• 1

Batch size 30 AdamW vs Batch Size 1 Adafactor SDXL Training Comparison

Aug 8

• 2

Expert-Level Tutorials on Stable Diffusion & SDXL: Master Advanced Techniques and Strategies

Jun 3

• 3

Organizations

MonsterMMORPG's activity

posted an update 1 day ago

Post

391

NVIDIA Labs developed SANA model weights and Gradio demo app published —Check out this amazing new Text to Image model by NVIDIA

Official repo : https://github.com/NVlabs/Sana

1-Click Windows, RunPod, Massed Compute installers and free Kaggle notebook : https://www.patreon.com/posts/116474081

You can follow instructions on the repository to install and use locally. I tested on my Windows RTX 3060 and 3090 GPUs.

I have tested some speeds and VRAM usage too

Uses 9.5 GB VRAM but someone reported works good on 8 GB GPUs too

Default settings per image speeds as below

Free Kaggle Account Notebook on T4 GPU : 15 second
RTX 3060 (12 GB) : 9.5 second
RTX 3090 : 4 second
RTX 4090 : 2 second
More info : https://nvlabs.github.io/Sana/

Works great on RunPod and Massed Compute as well (cloud)

Sana : Efficient High-Resolution Image Synthesis
with Linear Diffusion Transformer

About Sana — Taken from official repo

We introduce Sana, a text-to-image framework that can efficiently generate images up to 4096 × 4096 resolution. Sana can synthesize high-resolution, high-quality images with strong text-image alignment at a remarkably fast speed, deployable on laptop GPU. Core designs include: Deep compression autoencoder: unlike traditional AEs, which compress images only 8×, we trained an AE that can compress images 32×, effectively reducing the number of latent tokens. Linear DiT: we replace all vanilla attention in DiT with linear attention, which is more efficient at high resolutions without sacrificing quality. Decoder-only text encoder: we replaced T5 with modern decoder-only small LLM as the text encoder and designed complex human instruction with in-context learning to enhance the image-text alignment. Efficient training and sampling: we propose Flow-DPM-Solver to reduce sampling steps, with efficient caption labeling and selection to accelerate convergence.

replied to their post 7 days ago

@wikeeyang awesome. and thanks a lot for comment

Reacted to their post with 🤯🤝👍🧠➕😎🤗❤️👀🚀🔥 7 days ago

Post

2359

Kohya brought massive improvements to FLUX LoRA (as low as 4 GB GPUs) and DreamBooth / Fine-Tuning (as low as 6 GB GPUs) training - check attached images in full size to see full details

You can download all configs and full instructions

> https://www.patreon.com/posts/112099700 - Fine Tuning post

> https://www.patreon.com/posts/110879657 - LoRA post

Kohya brought massive improvements to FLUX LoRA and DreamBooth / Fine-Tuning (min 6GB GPU) training.

Now as low as 4GB GPUs can train FLUX LoRA with decent quality and 24GB and below GPUs got a huge speed boost when doing Full DreamBooth / Fine-Tuning training

You need minimum 4GB GPU to do a FLUX LoRA training and minimum 6 GB GPU to do FLUX DreamBooth / Full Fine-Tuning training. It is just mind blowing.

You can download all configs and full instructions > https://www.patreon.com/posts/112099700

The above post also has 1-click installers and downloaders for Windows, RunPod and Massed Compute

The model downloader scripts also updated and downloading 30+GB models takes total 1 minute on Massed Compute

You can read the recent updates here : https://github.com/kohya-ss/sd-scripts/tree/sd3?tab=readme-ov-file#recent-updates

This is the Kohya GUI branch : https://github.com/bmaltais/kohya_ss/tree/sd3-flux.1

Key thing to reduce VRAM usage is using block swap

Kohya implemented the logic of OneTrainer to improve block swapping speed significantly and now it is supported for LoRAs as well

Now you can do FP16 training with LoRAs on 24 GB and below GPUs

Now you can train a FLUX LoRA on a 4 GB GPU - key is FP8, block swap and using certain layers training (remember single layer LoRA training)

It took me more than 1 day to test all newer configs, their VRAM demands, their relative step speeds and prepare the configs :)

posted an update 7 days ago

Post

2359

replied to maxiw's post 8 days ago

@Preacher0001 hi

Reacted to maxiw's post with 🤗🚀👍🔥❤️ 10 days ago

Post

4530

I was curious to see what people post here on HF so I created a dataset with all HF Posts: maxiw/hf-posts

Some interesting stats:

Top 5 Authors by Total Impressions:
-----------------------------------
@merve : 171,783 impressions (68 posts)
@fdaudens : 135,253 impressions (81 posts)
@singhsidhukuldeep : 122,591 impressions (81 posts)
@akhaliq : 119,526 impressions (78 posts)
@MonsterMMORPG : 112,500 impressions (45 posts)

Top 5 Users by Number of Reactions Given:
----------------------------------------
@osanseviero : 1278 reactions
@clem : 910 reactions
@John6666 : 899 reactions
@victor : 674 reactions
@samusenps : 655 reactions

Top 5 Most Used Reactions:
-------------------------
❤️: 7048 times
🔥: 5921 times
👍: 4856 times
🚀: 2549 times
🤗: 2065 times

10 replies