2 12 4

Lee Park

gogo8232

AI & ML interests

None yet

Recent Activity

upvoted a paper 14 days ago

HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems

upvoted an article about 2 months ago

View all activity

Organizations

None yet

gogo8232's activity

upvoted a paper 14 days ago

HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems

Paper • 2411.02959 • Published 19 days ago • 64

upvoted an article about 2 months ago

Article

From DeepSpeed to FSDP and Back Again with Hugging Face Accelerate

Jun 13

• 44

updated 2 datasets 4 months ago

gogo8232/ucla-agi-data-mistral-7b-instruct-sppo-iter1_perplexities

Viewer • Updated Jul 18 • 19.8k • 37

gogo8232/ucla-agi-data-mistral-7b-instruct-sppo-iter1

Viewer • Updated Jul 14 • 19.8k • 34

updated 4 datasets 5 months ago

upvoted an article 5 months ago

Article

Our Transformers Code Agent beats the GAIA benchmark!

Jul 1

• 46

Reacted to yushun0410's post with 🔥 5 months ago

Post

4609

Hi Huggingfacers!

Thrilled to introduce Adam-mini, an optimizer that achieves on-par or better performance than AdamW with 45% to 50% less memory footprint. Adam-mini can also achieve 49.5% higher throughput than AdamW on Llama2-7B pre-training.

The design of Adam-mini is inspired by certain Hessian structures we observed on Transformers.

Feel free to try it out! Try switching to Adam-mini with the same hyperparams of AdamW, it would work with only half memory. Hope Adam-mini can help save time, cost, and energy in your tasks!

Paper: "Adam-mini: Use Fewer Learning Rates To Gain More" https://arxiv.org/abs/2406.16793

Code: https://github.com/zyushun/Adam-mini