Michael Pieler's picture

91 2 8

Michael Pieler

MicPie

·

MicPie

AI & ML interests

ML

Recent Activity

liked a Space about 1 month ago

k-mktr/gpu-poor-llm-arena

View all activity

Organizations

MicPie's activity

liked a Space about 1 month ago

GPU Poor LLM Arena

Compact LLM Battle Arena: Frugal AI Face-Off!

upvoted a collection 3 months ago

Open LLM Leaderboard best models ❤️‍🔥

A daily uploaded list of models with best evaluations on the LLM leaderboard: • 60 items • Updated about 1 hour ago • 444

liked 2 Spaces 4 months ago

StableLM 2 12B Chat

Stable Code Instruct 3b

Reacted to m-ric's post with 👀 4 months ago

Post

2251

𝗧𝗵𝗲 𝗵𝘂𝗴𝗲 𝗰𝗼𝘀𝘁 𝗼𝗳 𝗿𝗲𝘀𝗲𝗮𝗿𝗰𝗵 𝗼𝗻 𝗳𝗿𝗼𝗻𝘁𝗶𝗲𝗿 𝗟𝗟𝗠𝘀 💸

Google DeepMind recently released a great paper that shows optimal hyperparameters to train across different regimes: Scaling Exponents Across Parameterizations and Optimizers, with data from 10,000 training runs.

One engineer decided to quantify the price of such a large-scale experiment.

😬 And the bill is hefty: ~13M USD

This exact number is to take with a grain of salt because many approximations were necessary to get the final result.

⛔️ But still this ballpark means that for this sole experiment, the price is way over what most startups or research labs could afford.

This means that open-sourcing research is more important than ever, to put everyone in the ecosystem on a roughly equal footing. Don't let OpenAI run first, they'll keep everything for themselves!

Read the full post that quantifies the paper's cost 👉 https://152334h.github.io/blog/scaling-exponents/

1 reply

·

New activity in HuggingFaceFW/fineweb-edu-llama3-annotations 6 months ago

100 word justifications from Llama3 annotations

#3 opened 6 months ago by

liked a dataset 6 months ago

HuggingFaceFW/fineweb-edu-llama3-annotations

Viewer • Updated Jun 3 • 467k • 264 • 36

liked a Space 6 months ago

FineWeb: decanting the web for the finest text data at scale

upvoted a collection 7 months ago

📀 Dataset comparison models

1.8B models trained on 350BT to compare different pretraining datasets • 8 items • Updated Jun 12 • 31

liked a model 7 months ago

sfairXC/FsfairX-LLaMA3-RM-v0.1

Text Classification • Updated Oct 14 • 11.2k • 48

New activity in sfairXC/FsfairX-LLaMA3-RM-v0.1 7 months ago

Training details?

#2 opened 7 months ago by

authored 4 papers 7 months ago

Stable LM 2 1.6B Technical Report

Paper • 2402.17834 • Published Feb 27 • 3

Are large language models superhuman chemists?

Paper • 2404.01475 • Published Apr 1 • 16

GPT-NeoX-20B: An Open-Source Autoregressive Language Model

Paper • 2204.06745 • Published Apr 14, 2022 • 1

Few-shot Adaptation Works with UnpredicTable Data

Paper • 2208.01009 • Published Aug 1, 2022

New activity in JeanKaddour/minipile about 1 year ago

Domain and provenance annotation

#1 opened about 1 year ago by

New activity in allenai/peS2o over 1 year ago

Semantic Scholar API metadata for this dataset?

#1 opened over 1 year ago by

authored a paper over 1 year ago

Inverse Scaling: When Bigger Isn't Better

Paper • 2306.09479 • Published Jun 15, 2023 • 8

liked a dataset over 1 year ago

JeanKaddour/minipile

Viewer • Updated Jun 20, 2023 • 1.01M • 1.65k • 115

New activity in inverse-scaling/opt-30b_eval almost 2 years ago

Add evaluation results on the mathemakitten--winobias_antistereotype_test_cot_v4 config and test split of mathemakitten/winobias_antistereotype_test_cot_v4

#13 opened about 2 years ago by