MInference

AI & ML interests

None defined yet.

Recent Activity

liyucheng  updated a dataset 11 days ago
MInference/bfd
liyucheng  updated a model 12 days ago
MInference/qwen25-math-7b-instruct
liyucheng  published a model 12 days ago
MInference/qwen25-math-7b-instruct
View all activity

MInference's activity

liyucheng 
updated a dataset about 2 months ago
liyucheng 
published a dataset about 2 months ago
liyucheng 
published a model about 2 months ago
iofu728 
posted an update 8 months ago
view post
Post
1118
Weclome to use MInference, which leverages the dynamic sparse nature of LLMs' attention, which exhibits some static patterns, to speed up the pre-filling for million tokens LLMs. It first determines offline which sparse pattern each head belongs to, then approximates the sparse index online and dynamically computes attention with the optimal custom kernels. This approach achieves up to a 10x speedup for pre-filling on an A100 while maintaining accuracy with 1M tokens.

For more detail please check,
project page: https://aka.ms/MInference
code: https://github.com/microsoft/MInference
paper: MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention (2407.02490)
hf demo: microsoft/MInference