Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
XueyingJia
/
qwen2.5-0.5b-oaif
like
0
Transformers
Safetensors
XueyingJia/hh-rlhf-train-filtered
Generated from Trainer
trl
online-dpo
Inference Endpoints
arxiv:
2402.04792
Model card
Files
Files and versions
Community
Train
Deploy
Use this model
main
qwen2.5-0.5b-oaif
Commit History
End of training
2f08ab8
verified
XueyingJia
commited on
10 days ago
Model save
e4493ed
verified
XueyingJia
commited on
10 days ago
Training in progress, step 1316
b936502
verified
XueyingJia
commited on
10 days ago
Training in progress, step 1000
31571a2
verified
XueyingJia
commited on
10 days ago
Training in progress, step 500
a23b18f
verified
XueyingJia
commited on
10 days ago
Training in progress, step 600
0615cd0
verified
XueyingJia
commited on
10 days ago
Training in progress, step 500
3e9db87
verified
XueyingJia
commited on
10 days ago
Training in progress, step 400
18be825
verified
XueyingJia
commited on
10 days ago
Training in progress, step 300
67c10c8
verified
XueyingJia
commited on
10 days ago
Training in progress, step 200
4d8d942
verified
XueyingJia
commited on
10 days ago
Training in progress, step 100
db88a15
verified
XueyingJia
commited on
10 days ago
initial commit
b2784e2
verified
XueyingJia
commited on
10 days ago