93 65 413

Florian Zimmermeister PRO

flozi00

AI & ML interests

ASR, German LLM

Recent Activity

posted an update 1 day ago

Running large language models efficiently is more than just raw GPU power. The latest guide breaks down the essential math to determine if your LLM workload is compute-bound or memory-bound. We apply these principles to a real-world example: Qwen's 32B parameter model on the new NVIDIA RTX PRO 6000 Blackwell Edition. In this guide, you will learn how to: Calculate your GPU's operational intensity (Ops:Byte Ratio) Determine your model's arithmetic intensity Identify whether your workload is memory-bound or compute-bound Read the full guide here: https://flozi.net/en/guides/ai/llm-inference-math

liked a model 4 days ago

openai/gpt-oss-120b

posted an update 6 days ago

Struggling with NVIDIA drivers on Ubuntu 24.04? Can't use your GPUs with CUDA installed, or only half of them work? Black screen after startup or nvidia-smi fails? The nokaslr boot option might be the cause—and the solution. Find out why disabling KASLR can fix these GPU issues until a permanent driver update is available. https://flozi.net/en/guides/linux/solving-nvidia-driver-issues-on-ubuntu-24-04-with-nokaslr

View all activity

Organizations

$A\\Ware's profile picture$

posted an update 1 day ago

Post

1173

Running large language models efficiently is more than just raw GPU power. The latest guide breaks down the essential math to determine if your LLM workload is compute-bound or memory-bound.

We apply these principles to a real-world example: Qwen's 32B parameter model on the new NVIDIA RTX PRO 6000 Blackwell Edition.

In this guide, you will learn how to:

Calculate your GPU's operational intensity (Ops:Byte Ratio)
Determine your model's arithmetic intensity
Identify whether your workload is memory-bound or compute-bound

Read the full guide here: https://flozi.net/en/guides/ai/llm-inference-math

liked a model 4 days ago

openai/gpt-oss-120b

Text Generation • 120B • Updated Aug 26 • 4.4M • • 4.16k

posted an update 6 days ago

Post

234

Struggling with NVIDIA drivers on Ubuntu 24.04?
Can't use your GPUs with CUDA installed, or only half of them work?
Black screen after startup or nvidia-smi fails?

The nokaslr boot option might be the cause—and the solution.
Find out why disabling KASLR can fix these GPU issues until a permanent driver update is available.

https://flozi.net/en/guides/linux/solving-nvidia-driver-issues-on-ubuntu-24-04-with-nokaslr

liked a model 8 days ago

LiquidAI/LFM2-ColBERT-350M

liked a model 12 days ago

moonshotai/Kimi-K2-Thinking

Text Generation • Updated 10 days ago • 153k • • 1.27k

posted an update 12 days ago

Post

1935

I just got asked about the differences between Blackwell systems and Grace Blackwell systems. What's the difference and how much of a performance gap is there between them?

https://flozi.net/en/hardware/nvidia/benchmarks/b200-vs-gb200-efficiency-comparison

Here's a summary of the key points from the article:

GB200 (Grace Blackwell) is a Superchip: It integrates a Grace CPU and two Blackwell GPUs into a single package.
B200 is a GPU-only module: It's designed to be paired with x86 or ARM CPUs in more traditional server setups.

Performance and Efficiency:

Based on MLPerf Training v5.0 benchmarks, the article concludes:

GB200 systems are approximately 42% more efficient than B200 systems on average. This is especially true in large-scale deployments (100+ GPUs), where the GB200's integrated design and high-speed NVLink interconnect provide a significant advantage.

In smaller, single-node systems (e.g., 8 GPUs), the performance difference is much smaller, around 10-15%.

Use Cases:

Choose GB200 for large-scale AI clusters, training massive models, and when maximum efficiency is the top priority.

Choose B200 for smaller deployments, when you need the flexibility to choose your own CPU, or for mixed AI and HPC workloads.

replied to their post 13 days ago

Definitely, a lot of ads, spam, blenders and political discussions. Huggingface papers, articles and x are my main sources for news.

published a Space 13 days ago

Structured Docling

📄

Extract structured data from documents using AI

updated a Space 13 days ago

Structured Docling

📄

Extract structured data from documents using AI

posted an update 15 days ago

Post

3120

Some weeks ago, i've just decide its time to leave LinkedIn for me.
It got silent around my open source activities the last year, so i thought something has to change.

That's why my focus will move to share experiences and insights about hardware, drivers, kernels and linux. I won't post about how to use models, built agents or do prompting. I want to share about some deeper layers the actual hypes are built on.

I will start posting summarizations of my articles here on the hub.

English version:
https://flozi.net/en

German translated version:
https://flozi.net/de

Feel free to reach me if you want to read something specific.