Spaces:

microsoft
/

MInference

Running on Zero

iofu728 commited on Jul 7

Commit

962818f

•

1 Parent(s): c3766a1

Feature(MInference): update information

Files changed (1) hide show

app.py CHANGED Viewed

@@ -26,9 +26,6 @@ _Huiqiang Jiang†, Yucheng Li†, Chengruidong Zhang†, Qianhui Wu, Xufang Luo
 - 🪗 [24/07/07] Thanks @AK for sponsoring. You can now use MInference online in the [HF Demo](https://huggingface.co/spaces/microsoft/MInference) with ZeroGPU.
 - 🧩 [24/07/03] We will present **MInference 1.0** at the _**Microsoft Booth**_ and _**ES-FoMo**_ at ICML'24. See you in Vienna!
-## TL;DR
-**MInference 1.0** leverages the dynamic sparse nature of LLMs' attention, which exhibits some static patterns, to speed up the pre-filling for long-context LLMs. It first determines offline which sparse pattern each head belongs to, then approximates the sparse index online and dynamically computes attention with the optimal custom kernels. This approach achieves up to a **10x speedup** for pre-filling on an A100 while maintaining accuracy.
 <font color="brown"><b>This is only a deployment demo. You can follow the code below to try MInference locally.</b></font>
 ```bash

 - 🪗 [24/07/07] Thanks @AK for sponsoring. You can now use MInference online in the [HF Demo](https://huggingface.co/spaces/microsoft/MInference) with ZeroGPU.
 - 🧩 [24/07/03] We will present **MInference 1.0** at the _**Microsoft Booth**_ and _**ES-FoMo**_ at ICML'24. See you in Vienna!
 <font color="brown"><b>This is only a deployment demo. You can follow the code below to try MInference locally.</b></font>
 ```bash