view article Article How we OCR'ed 30,000 papers using Codex, open OCR models and Jobs nielsr • Apr 7 • 61
view article Article ⚡ nano-vLLM: Lightweight, Low-Latency LLM Inference from Scratch zamal • Jun 28, 2025 • 41
view article Article Assisted Generation: a new direction toward low-latency text generation joaogante • May 11, 2023 • 78
view article Article A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes ybelkada, timdettmers • Aug 17, 2022 • 131
Running on CPU Upgrade Featured 3.16k The Smol Training Playbook 📚 3.16k The secrets to building world-class LLMs