waraml
/

ViLinh-3B

Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

qnguyen3 commited on 15 days ago

Commit

dde5904

•

1 Parent(s): 99bb63d

Update README.md

Files changed (1) hide show

README.md +1 -66

README.md CHANGED Viewed

@@ -9,69 +9,4 @@ tags:
 model-index:
 - name: vylinh-dpo-v4
   results: []
----
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# vylinh-dpo-v4
-This model is a fine-tuned version of [qnguyen3/vylinh-qwen-3b-merged](https://huggingface.co/qnguyen3/vylinh-qwen-3b-merged) on the dpo_mix_vi dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.4053
-- Rewards/chosen: 2.9905
-- Rewards/rejected: -1.6223
-- Rewards/accuracies: 0.7143
-- Rewards/margins: 4.6128
-- Logps/rejected: -596.4308
-- Logps/chosen: -706.0082
-- Logits/rejected: -2.3770
-- Logits/chosen: -2.8608
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 5e-07
-- train_batch_size: 2
-- eval_batch_size: 2
-- seed: 42
-- distributed_type: multi-GPU
-- num_devices: 4
-- gradient_accumulation_steps: 4
-- total_train_batch_size: 32
-- total_eval_batch_size: 8
-- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
-- lr_scheduler_type: cosine
-- lr_scheduler_warmup_ratio: 0.1
-- num_epochs: 4.0
-### Training results
-| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
-|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
-| 0.3646        | 0.9995 | 1368 | 0.3423          | 3.0067         | 0.1768           | 0.7143             | 2.8299          | -578.4399      | -705.8457    | -2.3914         | -2.8822       |
-| 0.1542        | 1.9996 | 2737 | 0.3495          | 2.9122         | -0.7397          | 0.75               | 3.6519          | -587.6050      | -706.7914    | -2.3921         | -2.8787       |
-| 0.0501        | 2.9998 | 4106 | 0.3891          | 3.0809         | -1.3165          | 0.7143             | 4.3974          | -593.3735      | -705.1045    | -2.3870         | -2.8676       |
-| 0.0124        | 3.9978 | 5472 | 0.4053          | 2.9905         | -1.6223          | 0.7143             | 4.6128          | -596.4308      | -706.0082    | -2.3770         | -2.8608       |
-### Framework versions
-- Transformers 4.45.2
-- Pytorch 2.5.0+cu121
-- Datasets 2.21.0
-- Tokenizers 0.20.1

 model-index:
 - name: vylinh-dpo-v4
   results: []
+---