Instructions to use EnsueAI/metal-int4-sdpa with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Kernels
How to use EnsueAI/metal-int4-sdpa with Kernels:
# !pip install kernels from kernels import get_kernel kernel = get_kernel("EnsueAI/metal-int4-sdpa") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -14,7 +14,7 @@ library_name: kernels
|
|
| 14 |
|
| 15 |
Core attention kernel from [Open-TQ-Metal](https://github.com/mutable-state-inc/Open-TQ-Metal).
|
| 16 |
|
| 17 |
-
Open-TQ-Metal is a Metal-native implementation of fused compressed-domain attention built by Ensue. The full release enables Llama 3.1 70B at 128K context on a single 64GB Mac, and includes a C++ inference engine, multiple attention kernels, a 330-experiment cross-architecture analysis, and a paper.
|
| 18 |
|
| 19 |
- Paper: https://arxiv.org/pdf/2604.16957
|
| 20 |
- Write-up: https://ensue.dev/blog/introducing-open-tq-metal/
|
|
|
|
| 14 |
|
| 15 |
Core attention kernel from [Open-TQ-Metal](https://github.com/mutable-state-inc/Open-TQ-Metal).
|
| 16 |
|
| 17 |
+
Open-TQ-Metal is a Metal-native implementation of fused compressed-domain attention built by Ensue. The full release enables Llama 3.1 70B at 128K context on a single 64GB Mac, 48x faster attention at 128K context, and includes a C++ inference engine, multiple attention kernels, a 330-experiment cross-architecture analysis, and a paper.
|
| 18 |
|
| 19 |
- Paper: https://arxiv.org/pdf/2604.16957
|
| 20 |
- Write-up: https://ensue.dev/blog/introducing-open-tq-metal/
|