Update README.md
Browse files
README.md
CHANGED
@@ -19,7 +19,7 @@ This model is a **preference-aligned** version of the [previous SFT model](https
|
|
19 |
## Training Details
|
20 |
- Base Model: SFT-tuned Llama-3-8B
|
21 |
- Alignment Method: DPO (Direct Preference Optimization)
|
22 |
-
- Training Infrastructure: DeepSpeed + FlashAttention 2, on 4 x 3090
|
23 |
- Training Duration: 1 epoch
|
24 |
|
25 |
## Training Data
|
|
|
19 |
## Training Details
|
20 |
- Base Model: SFT-tuned Llama-3-8B
|
21 |
- Alignment Method: DPO (Direct Preference Optimization)
|
22 |
+
- Training Infrastructure: DeepSpeed (stage 1) + FlashAttention 2, on 4 x 3090
|
23 |
- Training Duration: 1 epoch
|
24 |
|
25 |
## Training Data
|