Post
1813
π₯ππ New Research Alert - YOCO! πππ₯
π Title: You Only Cache Once: Decoder-Decoder Architectures for Language Models π
π Description: YOCO is a novel decoder-decoder architecture for LLMs that reduces memory requirements, speeds up prefilling, and maintains global attention. It consists of a self-decoder for encoding KV caches and a cross-decoder for reusing these caches via cross-attention.
π₯ Authors: Yutao Sun et al.
π Paper: You Only Cache Once: Decoder-Decoder Architectures for Language Models (2405.05254)
π Repository: https://github.com/microsoft/unilm/tree/master/YOCO
π More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin
π Keywords: #YOCO #DecoderDecoder #LargeLanguageModels #EfficientArchitecture #GPUMemoryReduction #PrefillingSpeedup #GlobalAttention #DeepLearning #Innovation #AI
π Title: You Only Cache Once: Decoder-Decoder Architectures for Language Models π
π Description: YOCO is a novel decoder-decoder architecture for LLMs that reduces memory requirements, speeds up prefilling, and maintains global attention. It consists of a self-decoder for encoding KV caches and a cross-decoder for reusing these caches via cross-attention.
π₯ Authors: Yutao Sun et al.
π Paper: You Only Cache Once: Decoder-Decoder Architectures for Language Models (2405.05254)
π Repository: https://github.com/microsoft/unilm/tree/master/YOCO
π More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin
π Keywords: #YOCO #DecoderDecoder #LargeLanguageModels #EfficientArchitecture #GPUMemoryReduction #PrefillingSpeedup #GlobalAttention #DeepLearning #Innovation #AI