Elie Bakouch's picture

Elie Bakouch PRO

eliebak

·

AI & ML interests

Training LLM's @ 🤗

Recent Activity

liked a Space about 11 hours ago

scratchtoscale/training-time-calculator

liked a dataset 4 days ago

openbmb/DensingLaw-ScalingBench

liked a model 4 days ago

Delta-Vector/Austral-4.5B-Winton

View all activity

Organizations

upvoted a paper 6 days ago

M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models

Paper • 2504.10449 • Published Apr 14 • 15

upvoted a collection 8 days ago

Tiny Language Model Datasets

Collection of Synthetic Datasets that can be used in pretraining of any the Tiny Language Model • 14 items • Updated about 5 hours ago • 29

upvoted a paper 13 days ago

Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks

Paper • 2508.18672 • Published 27 days ago • 10

upvoted a paper 19 days ago

Fantastic Pretraining Optimizers and Where to Find Them

Paper • 2509.02046 • Published 20 days ago • 12

upvoted a paper 21 days ago

AWorld: Orchestrating the Training Recipe for Agentic AI

Paper • 2508.20404 • Published 25 days ago • 38

upvoted a paper 22 days ago

Motif 2.6B Technical Report

Paper • 2508.09148 • Published Aug 2 • 4

upvoted a paper about 1 month ago

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 86

upvoted 2 articles about 1 month ago

Article

Say hello to `hf`: a faster, friendlier Hugging Face CLI ✨

By

and 2 others •

Jul 25

• 81

Article

NVIDIA Releases 6 Million Multi-Lingual Reasoning Dataset

By

and 4 others •

Aug 20

• 17

upvoted 2 collections about 1 month ago

Seed-OSS

Seed-OSS Open-Source Models • 3 items • Updated Aug 20 • 58

DeepSeek-V3.1

3 items • Updated Aug 21 • 228

upvoted an article about 1 month ago

Article

MCP for Research: How to Connect AI to Research Tools

By

•

Aug 18

• 54

upvoted 3 papers about 1 month ago

BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining

Paper • 2508.10975 • Published Aug 14 • 59

EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning and Reasoning Modes

Paper • 2507.11407 • Published Jul 15 • 57

μ-Parametrization for Mixture of Experts

Paper • 2508.09752 • Published Aug 13 • 10

upvoted 2 articles about 1 month ago

Article

How to train a Language Model with Megatron-LM

By

•

Sep 7, 2022

• 19

Article

NVIDIA Releases 3 Million Sample Dataset for OCR, Visual Question Answering, and Captioning Tasks

By

and 4 others •

Aug 11

• 73

upvoted 3 articles about 2 months ago

Article

Welcome GPT OSS, the new open-source model family from OpenAI!

By

and 11 others •

Aug 5

• 495

Article

retrain-pipelines and the almighty function-caller

By

•

Apr 28

• 8

Article

Introducing Command A Vision: Multimodal AI built for Business

By

and 3 others •

Jul 31

• 63