MInference

AI & ML interests

None defined yet.

Recent Activity

liyucheng updated a dataset 11 days ago

MInference/bfd

liyucheng updated a model 12 days ago

MInference/qwen25-math-7b-instruct

liyucheng published a model 12 days ago

MInference/qwen25-math-7b-instruct

View all activity

MInference's activity

liyucheng

updated a dataset 11 days ago

MInference/bfd

Viewer • Updated 11 days ago • 9.31k • 65

liyucheng

updated a model 12 days ago

MInference/qwen25-math-7b-instruct

Text Generation • Updated 12 days ago • 1.3k

liyucheng

published a model 12 days ago

MInference/qwen25-math-7b-instruct

Text Generation • Updated 12 days ago • 1.3k

liyucheng

published a dataset 17 days ago

MInference/bfd

Viewer • Updated 11 days ago • 9.31k • 65

liyucheng

updated 2 models about 2 months ago

MInference/llava-vid

MInference/longvila-qwen-7b-1m

Updated Jan 20 • 16

liyucheng

published a model about 2 months ago

MInference/longvila-qwen-7b-1m

Updated Jan 20 • 16

liyucheng

updated a dataset about 2 months ago

MInference/llava-vid

Updated Jan 20 • 20

liyucheng

published a dataset about 2 months ago

MInference/llava-vid

Updated Jan 20 • 20

liyucheng

published a model about 2 months ago

MInference/llava-vid

liyucheng

updated a dataset about 2 months ago

MInference/v-niah-haystack

Viewer • Updated Jan 15 • 3 • 51

liyucheng

authored a paper 3 months ago

SCBench: A KV Cache-Centric Analysis of Long-Context Methods

Paper • 2412.10319 • Published Dec 13, 2024 • 10

iofu728

authored a paper 3 months ago

SCBench: A KV Cache-Centric Analysis of Long-Context Methods

Paper • 2412.10319 • Published Dec 13, 2024 • 10

iofu728

updated a dataset 3 months ago

MInference/SCBench

Viewer • Updated Dec 13, 2024 • 922 • 256

liyucheng

updated a dataset 3 months ago

MInference/SCBench

Viewer • Updated Dec 13, 2024 • 922 • 256

liyucheng

updated a dataset 4 months ago

MInference/mt-bench

Preview • Updated Nov 13, 2024 • 498

iofu728

authored a paper 6 months ago

RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval

Paper • 2409.10516 • Published Sep 16, 2024 • 41

liyucheng

authored a paper 7 months ago

Data Contamination Report from the 2024 CONDA Shared Task

Paper • 2407.21530 • Published Jul 31, 2024 • 10

iofu728

posted an update 8 months ago

Post

1118

Weclome to use MInference, which leverages the dynamic sparse nature of LLMs' attention, which exhibits some static patterns, to speed up the pre-filling for million tokens LLMs. It first determines offline which sparse pattern each head belongs to, then approximates the sparse index online and dynamically computes attention with the optimal custom kernels. This approach achieves up to a 10x speedup for pre-filling on an A100 while maintaining accuracy with 1M tokens.

For more detail please check,
project page: https://aka.ms/MInference
code: https://github.com/microsoft/MInference
paper: MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention (2407.02490)
hf demo: microsoft/MInference

liyucheng

authored a paper 8 months ago

MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

Paper • 2407.02490 • Published Jul 2, 2024 • 25