7 9 45

Mitko Vasilev

mitkox

AI & ML interests

Make sure you own your AI. AI in the cloud is not aligned with you; it's aligned with the company that owns it.

Recent Activity

posted an update 7 days ago

I’ve built my blocker for AI-generated content. It’s a local AI running on my laptop with a browser extension that classifies and scrubs synthetic content from my eyeballs. I’m too old for this synthetic noise. TL;DR I’m going full John Connor on the AI content apocalypse Think of it as an on device AI ad-blocker, but for: Em-dash overdose. Seriously, why is everything suddenly revolutionary—disruptive—life-changing? AI influencers’ auto-generated posts and images, auto-posted, all hands-free. Fake news, fake images, fake people... puff. Surprisingly, it works. I suppose it will block some human-generated content. However, I would rather read a 2007 Myspace blog than another “10 Growth Hacks Powered By ChatGPT” post.

posted an update 25 days ago

Hermes4 70B synthetic dataset generation on my desktop Z8 GPU rig: 307 tok/sec 1.1M tok/hour The bottleneck for generating massive, high-quality reinforcement learning datasets is never the GPU compute; it's always the model's willingness to actually answer the darn question.

liked a model about 1 month ago

deepseek-ai/DeepSeek-V3.1-Base

View all activity

Organizations

posted an update 7 days ago

Post

5493

I’ve built my blocker for AI-generated content. It’s a local AI running on my laptop with a browser extension that classifies and scrubs synthetic content from my eyeballs. I’m too old for this synthetic noise.

TL;DR I’m going full John Connor on the AI content apocalypse

Think of it as an on device AI ad-blocker, but for:
Em-dash overdose. Seriously, why is everything suddenly revolutionary—disruptive—life-changing?
AI influencers’ auto-generated posts and images, auto-posted, all hands-free.
Fake news, fake images, fake people... puff.

Surprisingly, it works. I suppose it will block some human-generated content. However, I would rather read a 2007 Myspace blog than another “10 Growth Hacks Powered By ChatGPT” post.

3 replies

posted an update 25 days ago

Post

326

Hermes4 70B synthetic dataset generation on my desktop Z8 GPU rig:
307 tok/sec
1.1M tok/hour

The bottleneck for generating massive, high-quality reinforcement learning datasets is never the GPU compute; it's always the model's willingness to actually answer the darn question.

posted an update about 2 months ago

Post

1736

Earlier today, humanity faced a critical threat from a catastrophic chart crime. I asked my local Qwen3 Coder Flash to fix it. Sleep well, fellow humans. The visualization singularity is now high, and it runs with zero warnings.

2 replies

posted an update about 2 months ago

Post

3308

I run Claude Code with Qwen3 Coder Flash locally on my MacBook Air. It works offline, zero cloud, zero internet, zero EU AI Act anxiety. No limit with all tokens on the house.

It’s not great, not terrible- adequate performance for an on device AI agent chewing through code on a 1.24 kg laptop. I wrote an interpreter to broker peace between Claude Code and my local AI runtime.

Make sure you own your AI. AI in the cloud is not aligned with you; it’s aligned with the company that owns it.

3 replies

posted an update about 2 months ago

Post

3429

XBai o4 claims to beat Claude Opus 4 and o3-mini, and they provide verifiable proof. My skepticism circuits overloaded, but my local AI FOMO module screamed louder.
I've thrown this 33B monoblock LLM onto a single GPU and used Roo Code for some… let’s call it “vibe testing”. It’s terrifyingly competent. As an architect, it’s the best open-weight model I’ve touched this side of 2025.

1 reply

posted an update about 2 months ago

Post

2571

We’ve reached a point where on device AI coding that is free, offline, and capable isn’t just a theoretical possibility; it’s sitting on my lap, barely warming my thighs.
My local MacBook Air setup includes a Qwen3 Coder Flash with a 1M context, Cline in a VSCode IDE. No internet, no cloud, no ID verification- this is the forbidden tech.
Current stats:
All agentic tools work great local, sandboxed, and MCP
OK model output precision
17 tokens/sec. Not great, not terrible
65K tokens context, the model can do 1M, but let’s be real, my MacBook Air would probably achieve fusion before hitting that smoothly
Standard backend and cache off for the test
All inference and function calling happen locally, offline, untethered. The cloud didn’t even get a memo.

3 replies

posted an update about 2 months ago

Post

2624

I got 370 tokens/sec of Qwen3-30B-A3B 2507 on my desktop Z8 GPU workstation. My target is 400 t/s, and the last 10 % always tastes like victory!

3 replies

posted an update about 2 months ago

Post

2102

I run Qwen3-Coder 480B locally on my Z8, with a 1-million token context window. It’s the equivalent of parallel-parking a Nimitz-class carrier in a kiddie pool. Thanks to whatever dark pact the llama.cpp, CUDA, and kernel folks signed, hybrid inferencing + VRAM↔RAM offload let me stream the model’s synapses across Xeon, RAM, and four lonely A6000s without summoning either the OOM killer or a small house fire.

posted an update 2 months ago

Post

269

I needed a distraction-free AI dev system. Omarchy-AI is an opinionated, minimalist, purpose-built OS layer for AI engineers. Arch Linux, stripped, hardened, and injected with pure, uncut AI developer ergonomics.

TL;DR Omarchy-AI. Vertical AF. One Job, Done Stupidly Well. Only cares about AI engineering.

It’s built on top of Arch Linux & Omarchy and further optimized for:
- Offline, on-the-go AI development. Yes, even on your gaming laptop, or freshly minted DIGITS
- Seamless shift to GPU server backends, because your PC shouldn’t train a 1T Kimi K2 model
- Pre-baked RAG pipelines, agentic workflows, and model fine-tuning
- Actual productivity to spend hours hacking local AI agents, not debugging uv conflicts.

How It Works (The Geeky Bits)
It’s One Curl Command to Rule Them All. Turning a vanilla Arch into a batteries-included local AI dev beast with Hyprland, CUDA, llama.cpp, gcc, and every CLI tool you pretend to know about.

Hyprland: Because Your GPU Deserves Glam Shots. Picked it for the nihilists, tweakers, and keyboard cowboys. It’s an independent Wayland compositor that works great and has zero questions like “how do I get this pretty?”

Make sure you own your AI. AI in the cloud is not aligned with you; it’s aligned with the company that owns it.

1 reply

posted an update 8 months ago

Post

3665

llama.cpp is 26.8% faster than ollama.
I have upgraded both, and using the same settings, I am running the same DeepSeek R1 Distill 1.5B on the same hardware. It's an Apples to Apples comparison.

Total duration:
llama.cpp 6.85 sec <- 26.8% faster
ollama 8.69 sec

Breakdown by phase:
Model loading
llama.cpp 241 ms <- 2x faster
ollama 553 ms

Prompt processing
llama.cpp 416.04 tokens/s with an eval time 45.67 ms <- 10x faster
ollama 42.17 tokens/s with an eval time of 498 ms

Token generation
llama.cpp 137.79 tokens/s with an eval time 6.62 sec <- 13% faster
ollama 122.07 tokens/s with an eval time 7.64 sec

llama.cpp is LLM inference in C/C++; ollama adds abstraction layers and marketing.

Make sure you own your AI. AI in the cloud is not aligned with you; it's aligned with the company that owns it.

7 replies

posted an update 8 months ago

Post

713

Stargate to the west of me
DeepSeek to the east
Here I am
Stuck in the middle with the EU

It will likely be a matter of sparkle to get export control on frontier research and models on both sides, leaving us in a vacuum.

Decentralized training infrastructure and on device inferencing are the future.

posted an update 8 months ago

Post

562

On device AI reasoning ODA-R using speculative decoding with draft model DeepSeek-R1-Distill-Qwen-1.5B and DeepSeek-R1-Distill-Qwen-32B. DSPy compiler for reasoning prompts in math, engineering, code...

posted an update 8 months ago

Post

1462

Training a model to reason in the continuous latent space based on Meta's Coconut.
If it all works will apply it on the MiniCPM-o SVD-LR.
Endgame is a multimodal, adaptive, and efficient foundational on device AI model.

2 replies

replied to their post 9 months ago

DDR5 on HP Z8 G5

replied to their post 9 months ago

exactly Q2 med with ~190GB RAM

posted an update 9 months ago

Post

2538

Can it run DeepSeek V3 671B is the new 'can it run Doom'.

How minimalistic can I go with on device AI with behemoth models - here I'm running DeepSeek V3 MoE on a single A6000 GPU.

Not great, not terrible, for this minimalistic setup. I love the Mixture of Experts architectures. Typically I'm running my core LLM distributed over the 4 GPUs.

Make sure you own your AI. AI in the cloud is not aligned with you; it's aligned with the company that owns it.

5 replies

posted an update about 1 year ago

Post

2560

I started Friday with decentralized AI using Gemma-2, and it all works without blockchain. This is what I did:

1. Pinned Gemma-2 9B in the Interplanetary Filesystem IPFS with the LoRA fine-tuning adapters.
2. Set up a llama-ipfs server to fetch and cache the model and adapters on the fly and inference locally.

Now, I can use my on device AI platform across:

• All my macOS automation workflows
• All my browsers
• My Copilot++ in VSCode
• My Open Apple Intelligence (OAI, not to be confused with the other closed OAI owned by a nonprofit foundation and BigTech)

The llama-ipfs server’s RPC support lets me decentralize inferencing across all my devices, supercharging computing and energy efficiency.

Make sure you own your AI. AI in the cloud is not aligned with you, it’s aligned with the company that owns it.

2 replies

posted an update about 1 year ago

Post

2238

I'm decentralizing my AI end2end, from the AI model distribution to on device AI inferencing. llama-ipfs - llama.cpp integrated with Interplanetary File System for distributing peer2peer and loading AI models without the need for cloud storage or AI model Hub.

llama.cpp now supports decentralized inferencing with RPC, allowing the distribution of workload across all home devices. This functionality can be enhanced with a P2P ad-hoc VPN, enabling the extension of distributed inferencing to any device on any network.

Imagine an open-source AI that's as decentralized as a potluck dinner - everyone brings something to the table, and there's ZERO need for blockchain. It's like a digital fortress, with security and privacy baked right in, not to mention a dollop of integrity and trust. This could be the secret sauce for an enterprise AI platform, complete with an integrated IT policy. It might just be the cherry on top for the next generation of Apple Intelligence and Copilot+ PCs.

Make sure you own your AI. AI in the cloud is not aligned with you; it's aligned with the company that owns it.

posted an update about 1 year ago

Post

710

I'm decentralizing my AI. I'll be using Radicle for decentralized Git and IPFS for distributing AI models.

I believe there is a significant opportunity to democratize open AI development moving forward. I appreciate that Radicle is open-source, prioritizes local operations, functions offline, seeds data peer-to-peer from my node, is programmable, and incorporates built-in security features.

IPFS is great decentralized data storage, and I have already begun seeding SLMs and LoRa adapters. Tomorrow will add my collection of LLMs, VLMs, etc models and datasets I'm actively using. I have 10Gbps fiber optics at home so my node has enough bandwidth.

Make sure you own your AI. AI in the cloud is not aligned with you; it's aligned with the company that owns it.

2 replies

posted an update about 1 year ago

Post

3472

I've made an on device AI comparison between open source, Apple Intelligence, and Microsoft Copilot+ PC. This OS and applications level integration will bring GenAI to everyone, be it consumers or businesses, over the next year.

Communities and BigTech hold divergent visions regarding the problems they aim to solve, ways to lock in users and enterprises, as well as their commercialization and GTM strategies.

I'm aware that this table has the potential to expand into an epic 30-page saga during an in-depth analysis, but hey, it's a beginning. Do you think I should throw in a few more comparisons? I'm all ears for your thoughts and critiques!

Make sure you own your AI. AI in the cloud is not aligned with you; it's aligned with the company that owns it

1 reply

Mitko Vasilev

AI & ML interests

Recent Activity

Organizations

mitkox's activity