@DmitryRyumin on Hugging Face: "🔥🚀🌟 New Research Alert - YOCO! 🌟🚀🔥 📄 Title: You Only Cache Once:…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

DmitryRyumin

posted an update May 9

Post

1813

🔥🚀🌟 New Research Alert - YOCO! 🌟🚀🔥
📄 Title: You Only Cache Once: Decoder-Decoder Architectures for Language Models 🔝

📝 Description: YOCO is a novel decoder-decoder architecture for LLMs that reduces memory requirements, speeds up prefilling, and maintains global attention. It consists of a self-decoder for encoding KV caches and a cross-decoder for reusing these caches via cross-attention.

👥 Authors: Yutao Sun et al.

📄 Paper: You Only Cache Once: Decoder-Decoder Architectures for Language Models (2405.05254)

📁 Repository: https://github.com/microsoft/unilm/tree/master/YOCO

📚 More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

🔍 Keywords: #YOCO #DecoderDecoder #LargeLanguageModels #EfficientArchitecture #GPUMemoryReduction #PrefillingSpeedup #GlobalAttention #DeepLearning #Innovation #AI

scapking

May 10

羊毛出在羊身上，

iarbel

May 19

The evaluation benchmarks use zero-shot, where usually few-shot is used. This raises the question whether the few-shot results weren't as good, compared to similar-sized models.

Hopefully the authors will release few-shot results, aligned with common practice (e.g. HF Open LLM Leaderboard)

unilm

Jun 8

We report zero-shot numbers so that we can directly compare the results with stablelm and openllama-v2, which all follow the zero-shot protocol. The trend of few-shot results is similar to zero-shot.

In this post