2 1 6

Zhaolin Gao

GitBag

https://zhaolingao.github.io/

AI & ML interests

Reinforcement Learning from Human Feedback

Recent Activity

updated a model 2 days ago

GitBag/reasoning_rebel_uf_dp_1k3k_from1735956551_rfst_eta_1e4_lr_3e-7_1738016708

published a model 2 days ago

GitBag/reasoning_rebel_uf_dp_1k3k_from1735956551_rfst_eta_1e4_lr_3e-7_1738016708

updated a model 2 days ago

GitBag/reasoning_rebel_uf_dp_1k3k_from1735956551_rfst_eta_1e2_lr_3e-7_1737991767

View all activity

Articles

RLHF 101: A Technical Dive into RLHF

Dec 11, 2024

• 5

Organizations

Collections 1

Papers 3

arxiv:2410.04612

arxiv:2404.16767

arxiv:2402.10886

models 299

GitBag/reasoning_rebel_uf_dp_1k3k_from1735956551_rfst_eta_1e4_lr_3e-7_1738016708

Text Generation • Updated 2 days ago • 2

GitBag/reasoning_rebel_uf_dp_1k3k_from1735956551_rfst_eta_1e2_lr_3e-7_1737991767

Text Generation • Updated 2 days ago • 4

GitBag/reasoning_rebel_uf_dp_1k3k_from1735956551_rfst_eta_1e3_lr_3e-7_1738004267

Text Generation • Updated 2 days ago • 6

datasets 308

GitBag/llama3-uf-dp-from1735956551-token-rfst-1k3k_harvard

Viewer • Updated 4 days ago • 91.9k • 26

GitBag/llama3-uf-dp-from1735956551-token-rfst-1k3k

Viewer • Updated 5 days ago • 91.9k • 52

GitBag/llama3-uf-dp-from1735956551-token-st-1k3k_harvard

Viewer • Updated 5 days ago • 50.5k • 28

GitBag/llama3-uf-dp-from1735956551-token-rf-1k3k_harvard

Viewer • Updated 5 days ago • 41.4k • 55

GitBag/llama3-uf-dp-from1735956551-token-oa-1k3k

Viewer • Updated 5 days ago • 45.2k • 17

GitBag/llama3-uf-dp-from1735956551-token-st-1k3k

Viewer • Updated 5 days ago • 50.5k • 21

GitBag/llama3-uf-dp-from1735956551-token-rf-1k3k

Viewer • Updated 5 days ago • 41.4k • 18

GitBag/regenerated_responses_from_base_harvard

Viewer • Updated 13 days ago • 55.1k • 18

GitBag/llama3-uf-dp-from1735956551-same-turn

Viewer • Updated 16 days ago • 56.6k • 31

GitBag/llama3-uf-dp-from1735956551-reinforce

Viewer • Updated 16 days ago • 57.9k • 37

Zhaolin Gao

AI & ML interests

Recent Activity

Articles

RLHF 101: A Technical Dive into RLHF

Organizations

Collections 1

GitBag/gemma-2-9b-it-gsm8k

GitBag/llama-3_1-70b-it-gsm8k

GitBag/gemma-2-27b-it-gsm8k

GitBag/llama-3-8b-it-gsm8k

Papers 3

models 299

GitBag/reasoning_rebel_uf_dp_1k3k_from1735956551_rfst_eta_1e4_lr_3e-7_1738016708

GitBag/reasoning_rebel_uf_dp_1k3k_from1735956551_rfst_eta_1e2_lr_3e-7_1737991767

GitBag/reasoning_rebel_uf_dp_1k3k_from1735956551_rfst_eta_1e3_lr_3e-7_1738004267

GitBag/reasoning_rebel_uf_dp_1k3k_from1735956551_st_eta_1e4_lr_3e-7_1737941473

GitBag/reasoning_rebel_uf_dp_1k3k_from1735956551_st_eta_1e3_lr_3e-7_1737929737

GitBag/reasoning_rebel_uf_dp_1k3k_from1735956551_st_eta_1e2_lr_3e-7_1737917960

GitBag/reasoning_rebel_uf_dp_1k3k_from1735956551_rf_eta_1e4_lr_3e-7_1737906394

GitBag/reasoning_rebel_uf_dp_1k3k_from1735956551_rf_eta_1e3_lr_3e-7_1737894763

GitBag/reasoning_rebel_uf_dp_1k3k_from1735956551_rf_eta_1e2_lr_3e-7_1737883170

GitBag/reasoning_rebel_nianli_lr_3e-7_eta_1e6_1737077907

datasets 308

GitBag/llama3-uf-dp-from1735956551-token-rfst-1k3k_harvard

GitBag/llama3-uf-dp-from1735956551-token-rfst-1k3k

GitBag/llama3-uf-dp-from1735956551-token-st-1k3k_harvard

GitBag/llama3-uf-dp-from1735956551-token-rf-1k3k_harvard

GitBag/llama3-uf-dp-from1735956551-token-oa-1k3k

GitBag/llama3-uf-dp-from1735956551-token-st-1k3k

GitBag/llama3-uf-dp-from1735956551-token-rf-1k3k

GitBag/regenerated_responses_from_base_harvard

GitBag/llama3-uf-dp-from1735956551-same-turn

GitBag/llama3-uf-dp-from1735956551-reinforce

Zhaolin Gao

AI & ML interests

Recent Activity

Articles

RLHF 101: A Technical Dive into RLHF

Organizations

Collections 1

Papers 3

models 299 Sort: Recently updated

datasets 308 Sort: Recently updated

models 299

datasets 308