Zhaolin Gao
GitBag
AI & ML interests
Reinforcement Learning from Human Feedback
Recent Activity
updated
a dataset
3 minutes ago
GitBag/llama3-ultrafeedback-reasoning-test
updated
a dataset
about 11 hours ago
GitBag/llama3-ultrafeedback-reasoning-ReRe-armo-tokenized
updated
a model
3 days ago
GitBag/reasoning_rebel_iter_5_1731714556_eta_1e3_lr_3e-7_1731931011
Organizations
Collections
1
models
254
GitBag/reasoning_rebel_iter_5_1731714556_eta_1e3_lr_3e-7_1731931011
Text Generation
•
Updated
•
7
GitBag/reasoning_rebel_iter_5_1731714556_eta_1e2_lr_3e-7_1731926025
Text Generation
•
Updated
•
7
GitBag/reasoning_rebel_iter_5_1731714556_eta_1e1_lr_3e-7_1731903957
Text Generation
•
Updated
•
10
GitBag/reasoning_rebel_iter_5_1731714556_eta_1e4_lr_3e-7_1731935968
Text Generation
•
Updated
•
8
GitBag/reasoning_rebel_iter_4_1731513485_eta_1e4_lr_3e-7_1731719519
Text Generation
•
Updated
•
11
GitBag/reasoning_rebel_iter_4_1731513485_eta_1e3_lr_3e-7_1731714556
Text Generation
•
Updated
•
43
GitBag/reasoning_rebel_iter_4_1731513485_eta_1e2_lr_3e-7_1731709582
Text Generation
•
Updated
•
9
GitBag/reasoning_rebel_iter_4_1731513485_eta_1e1_lr_3e-7_1731686912
Text Generation
•
Updated
•
10
GitBag/reasoning_rebel_iter_3_1731243878_eta_1e5_lr_3e-7_1731523653
Text Generation
•
Updated
•
11
GitBag/reasoning_rebel_iter_3_1731243878_eta_1e6_lr_3e-7_1731528705
Text Generation
•
Updated
•
6
datasets
253
GitBag/llama3-ultrafeedback-reasoning-test
Viewer
•
Updated
•
378
GitBag/llama3-ultrafeedback-reasoning-ReRe-armo-tokenized
Viewer
•
Updated
•
229k
•
2
GitBag/llama3-ultrafeedback-reasoning-iter_5-1731714556-armo-tokenized_harvard
Viewer
•
Updated
•
54.6k
•
11
GitBag/llama3-ultrafeedback-reasoning-iter_5-1731714556-armo-tokenized
Viewer
•
Updated
•
54.6k
•
4
GitBag/llama3-ultrafeedback-reasoning-iter_5-1731714556-armo
Viewer
•
Updated
•
60.8k
•
8
GitBag/llama3-ultrafeedback-reasoning-iter_5-1731714556
Viewer
•
Updated
•
60.8k
•
9
GitBag/llama3-ultrafeedback-reasoning-iter_4-1731513485-armo-tokenized_harvard
Viewer
•
Updated
•
56.3k
•
17
GitBag/llama3-ultrafeedback-reasoning-iter_4-1731513485-armo-tokenized
Viewer
•
Updated
•
56.3k
•
13
GitBag/llama3-ultrafeedback-reasoning-iter_4-1731513485-armo
Viewer
•
Updated
•
60.8k
•
13
GitBag/llama3-ultrafeedback-reasoning-iter_4-1731513485
Viewer
•
Updated
•
60.8k
•
13