Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
skaltenp
/
Qwen2-1.5B-DPO
like
0
Transformers
TensorBoard
Safetensors
trl-lib/ultrafeedback_binarized
Generated from Trainer
trl
dpo
Inference Endpoints
arxiv:
2305.18290
Model card
Files
Files and versions
Metrics
Training metrics
Community
Train
Deploy
Use this model
main
Qwen2-1.5B-DPO
/
adapter_model.safetensors
Commit History
Training in progress, step 100
94c654f
verified
skaltenp
commited on
Dec 3, 2024
Training in progress, step 50
11e83e2
verified
skaltenp
commited on
Dec 3, 2024
Training in progress, step 237
822844d
verified
skaltenp
commited on
Dec 3, 2024
Training in progress, step 200
a61592c
verified
skaltenp
commited on
Dec 3, 2024
Training in progress, step 150
e970685
verified
skaltenp
commited on
Dec 3, 2024
Training in progress, step 100
f051c63
verified
skaltenp
commited on
Dec 3, 2024
Training in progress, step 50
0ebe68a
verified
skaltenp
commited on
Dec 3, 2024