view article Article ⚡ nano-vLLM: Lightweight, Low-Latency LLM Inference from Scratch Jun 28, 2025 • 40
view article Article Assisted Generation: a new direction toward low-latency text generation May 11, 2023 • 78
view article Article A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes Aug 17, 2022 • 130
Running on CPU Upgrade Featured 3.13k The Smol Training Playbook 📚 3.13k The secrets to building world-class LLMs