RachidAR's picture

RachidAR

RachidAR

·

RachidARx

AI & ML interests

1.58 bit LLM

Recent Activity

liked a model 11 days ago

NexaAIDev/omnivision-968M

liked a model 13 days ago

Qwen/Qwen2.5-Coder-14B-Instruct

liked a model 13 days ago

Qwen/Qwen2.5-Coder-32B-Instruct

View all activity

Organizations

RachidAR's activity

upvoted a collection 28 days ago

MobileLLM

Optimizing Sub-billion Parameter Language Models for On-Device Use Cases (ICML 2024) https://arxiv.org/abs/2402.14905 • 9 items • Updated 1 day ago • 97

upvoted 2 papers about 2 months ago

Differential Transformer

Paper • 2410.05258 • Published Oct 7 • 166

Addition is All You Need for Energy-efficient Language Models

Paper • 2410.00907 • Published Oct 1 • 144

upvoted 2 collections 2 months ago

Moshi v0.1 Release

MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi • 13 items • Updated Sep 18 • 219

Qwen2.5

Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 45 items • Updated about 6 hours ago • 392

upvoted a paper 3 months ago

OLMoE: Open Mixture-of-Experts Language Models

Paper • 2409.02060 • Published Sep 3 • 77

upvoted 5 papers 6 months ago

Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

Paper • 2403.03206 • Published Mar 5 • 59

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Paper • 2405.21060 • Published May 31 • 63

You Only Cache Once: Decoder-Decoder Architectures for Language Models

Paper • 2405.05254 • Published May 8 • 9

Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Paper • 2405.12981 • Published May 21 • 28

TerDiT: Ternary Diffusion Models with Transformers

Paper • 2405.14854 • Published May 23 • 2

upvoted a collection 6 months ago

Papers

18 items • Updated Oct 7 • 1

upvoted an article 7 months ago

Article

LLM Comparison/Test: Llama 3 Instruct 70B + 8B HF/GGUF/EXL2 (20 versions tested and compared!)

By

•

Apr 24

• 59

upvoted a paper 7 months ago

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22 • 253

upvoted 2 collections 7 months ago

Phi-3

Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths. • 26 items • Updated 14 days ago • 500

Meta Llama 3

This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated Sep 25 • 683

upvoted 4 papers 8 months ago

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Paper • 2404.08801 • Published Apr 12 • 63

Learn Your Reference Model for Real Good Alignment

Paper • 2404.09656 • Published Apr 15 • 82

Rho-1: Not All Tokens Are What You Need

Paper • 2404.07965 • Published Apr 11 • 84

InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory

Paper • 2402.04617 • Published Feb 7 • 4