EnsueAI
/

metal-int4-sdpa

Model card Files Files and versions

christinetyip commited on Apr 21

Commit

e241a15

·

verified ·

1 Parent(s): b123cc5

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -14,7 +14,7 @@ library_name: kernels
 Core attention kernel from [Open-TQ-Metal](https://github.com/mutable-state-inc/Open-TQ-Metal).
-Open-TQ-Metal is a Metal-native implementation of fused compressed-domain attention built by Ensue. The full release enables Llama 3.1 70B at 128K context on a single 64GB Mac, and includes a C++ inference engine, multiple attention kernels, a 330-experiment cross-architecture analysis, and a paper.
 - Paper: https://arxiv.org/pdf/2604.16957
 - Write-up: https://ensue.dev/blog/introducing-open-tq-metal/

 Core attention kernel from [Open-TQ-Metal](https://github.com/mutable-state-inc/Open-TQ-Metal).
+Open-TQ-Metal is a Metal-native implementation of fused compressed-domain attention built by Ensue. The full release enables Llama 3.1 70B at 128K context on a single 64GB Mac, 48x faster attention at 128K context, and includes a C++ inference engine, multiple attention kernels, a 330-experiment cross-architecture analysis, and a paper.
 - Paper: https://arxiv.org/pdf/2604.16957
 - Write-up: https://ensue.dev/blog/introducing-open-tq-metal/