-
XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference
Paper • 2404.15420 • Published • 7 -
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework
Paper • 2404.14619 • Published • 124 -
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Paper • 2404.14219 • Published • 250 -
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study
Paper • 2404.14047 • Published • 43
Collections
Discover the best community collections!
Collections including paper arxiv:2404.14219
-
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework
Paper • 2404.14619 • Published • 124 -
Multi-Head Mixture-of-Experts
Paper • 2404.15045 • Published • 58 -
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Paper • 2404.14219 • Published • 250 -
Learn Your Reference Model for Real Good Alignment
Paper • 2404.09656 • Published • 82
-
microsoft/Phi-3-mini-4k-instruct-gguf
Text Generation • Updated • 43.8k • 454 -
microsoft/Phi-3-mini-4k-instruct
Text Generation • Updated • 2.77M • • 1.03k -
microsoft/Phi-3-mini-128k-instruct-onnx
Text Generation • Updated • 279 • 182 -
microsoft/Phi-3-mini-128k-instruct
Text Generation • Updated • 211k • 1.57k