qnguyen3 commited on
Commit
dde5904
1 Parent(s): 99bb63d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -66
README.md CHANGED
@@ -9,69 +9,4 @@ tags:
9
  model-index:
10
  - name: vylinh-dpo-v4
11
  results: []
12
- ---
13
-
14
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
- should probably proofread and complete it, then remove this comment. -->
16
-
17
- # vylinh-dpo-v4
18
-
19
- This model is a fine-tuned version of [qnguyen3/vylinh-qwen-3b-merged](https://huggingface.co/qnguyen3/vylinh-qwen-3b-merged) on the dpo_mix_vi dataset.
20
- It achieves the following results on the evaluation set:
21
- - Loss: 0.4053
22
- - Rewards/chosen: 2.9905
23
- - Rewards/rejected: -1.6223
24
- - Rewards/accuracies: 0.7143
25
- - Rewards/margins: 4.6128
26
- - Logps/rejected: -596.4308
27
- - Logps/chosen: -706.0082
28
- - Logits/rejected: -2.3770
29
- - Logits/chosen: -2.8608
30
-
31
- ## Model description
32
-
33
- More information needed
34
-
35
- ## Intended uses & limitations
36
-
37
- More information needed
38
-
39
- ## Training and evaluation data
40
-
41
- More information needed
42
-
43
- ## Training procedure
44
-
45
- ### Training hyperparameters
46
-
47
- The following hyperparameters were used during training:
48
- - learning_rate: 5e-07
49
- - train_batch_size: 2
50
- - eval_batch_size: 2
51
- - seed: 42
52
- - distributed_type: multi-GPU
53
- - num_devices: 4
54
- - gradient_accumulation_steps: 4
55
- - total_train_batch_size: 32
56
- - total_eval_batch_size: 8
57
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
58
- - lr_scheduler_type: cosine
59
- - lr_scheduler_warmup_ratio: 0.1
60
- - num_epochs: 4.0
61
-
62
- ### Training results
63
-
64
- | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
65
- |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
66
- | 0.3646 | 0.9995 | 1368 | 0.3423 | 3.0067 | 0.1768 | 0.7143 | 2.8299 | -578.4399 | -705.8457 | -2.3914 | -2.8822 |
67
- | 0.1542 | 1.9996 | 2737 | 0.3495 | 2.9122 | -0.7397 | 0.75 | 3.6519 | -587.6050 | -706.7914 | -2.3921 | -2.8787 |
68
- | 0.0501 | 2.9998 | 4106 | 0.3891 | 3.0809 | -1.3165 | 0.7143 | 4.3974 | -593.3735 | -705.1045 | -2.3870 | -2.8676 |
69
- | 0.0124 | 3.9978 | 5472 | 0.4053 | 2.9905 | -1.6223 | 0.7143 | 4.6128 | -596.4308 | -706.0082 | -2.3770 | -2.8608 |
70
-
71
-
72
- ### Framework versions
73
-
74
- - Transformers 4.45.2
75
- - Pytorch 2.5.0+cu121
76
- - Datasets 2.21.0
77
- - Tokenizers 0.20.1
 
9
  model-index:
10
  - name: vylinh-dpo-v4
11
  results: []
12
+ ---