49 114 969

Jade

euclaise

AI & ML interests

None yet

Recent Activity

upvoted a paper 6 days ago

liked a dataset 7 days ago

PleIAs/common_corpus

liked a model 7 days ago

HuggingFaceTB/SmolLM2-1.7B-Instruct

Organizations

euclaise's activity

upvoted a paper 6 days ago

Cut Your Losses in Large-Vocabulary Language Models

Paper • 2411.09009 • Published 8 days ago • 38

upvoted a paper 19 days ago

TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters

Paper • 2410.23168 • Published 22 days ago • 22

upvoted a paper 22 days ago

Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA

Paper • 2410.20672 • Published 25 days ago • 5

upvoted a paper 26 days ago

Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss

Paper • 2410.17243 • Published about 1 month ago • 88

upvoted a paper 29 days ago

MiniPLM: Knowledge Distillation for Pre-Training Language Models

Paper • 2410.17215 • Published about 1 month ago • 12

upvoted 2 papers about 1 month ago

GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models

Paper • 2410.05229 • Published Oct 7 • 18

Differential Transformer

Paper • 2410.05258 • Published Oct 7 • 166

upvoted 2 collections about 2 months ago

Eurus

Collection

Advancing LLM Reasoning Generalists with Preference Trees • 11 items • Updated about 1 month ago • 24

Mini Pretrain Datasets

Collection

9 items • Updated Jul 9 • 9

upvoted 2 papers about 2 months ago

Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale

Paper • 2409.17115 • Published Sep 25 • 59

Not All LLM Reasoners Are Created Equal

Paper • 2410.01748 • Published Oct 2 • 27

upvoted 4 papers 2 months ago

InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning

Paper • 2409.12568 • Published Sep 19 • 47

EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer

Paper • 2409.10819 • Published Sep 17 • 17

Qwen2.5-Coder Technical Report

Paper • 2409.12186 • Published Sep 18 • 136

Gated Slot Attention for Efficient Linear-Time Sequence Modeling

Paper • 2409.07146 • Published Sep 11 • 19

upvoted 3 papers 3 months ago

upvoted a collection 3 months ago

Performance LLMs - Base Models

Collection

22 items • Updated Apr 26 • 7

upvoted a paper 4 months ago

Longhorn: State Space Models are Amortized Online Learners

Paper • 2407.14207 • Published Jul 19 • 17