view article Article Universal Assisted Generation: Faster Decoding with Any Assistant Model Oct 29, 2024 • 52
Distributed Speculative Inference of Large Language Models Paper • 2405.14105 • Published May 23, 2024 • 17
Distributed Speculative Inference of Large Language Models Paper • 2405.14105 • Published May 23, 2024 • 17
view post Post What if you could casually access your remote GPU in HF Spaces from the comfort of your local VSCode 🤯 8 replies · 👍 21 21 🤯 16 16 🤗 7 7 + Reply