chargoddard commited on
Commit
38da429
1 Parent(s): a192555

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +55 -0
README.md ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ datasets:
4
+ - pankajmathur/orca_mini_v1_dataset
5
+ - openai/summarize_from_feedback
6
+ - PygmalionAI/PIPPA
7
+ - chargoddard/rpguild
8
+ - lemonilia/LimaRP
9
+ - PKU-Alignment/PKU-SafeRLHF
10
+ - Intel/orca_dpo_pairs
11
+ - argilla/ultrafeedback-binarized-preferences
12
+ ---
13
+
14
+ Another experiment in the line of [loyal-piano-m7](https://huggingface.co/chargoddard/loyal-piano-m7).
15
+
16
+ Steps taken to produce this model:
17
+
18
+ * Train loyal-piano-m7
19
+ * cDPO with HuggingFaceH4/ultrafeedback_binarized to produce loyal-piano-m7-cdpo
20
+ * Train another model with different sampling of the same source datasets as loyal-piano, let's call it servile-harpsichord
21
+ * cDPO servile-harpsichord with argilla/ultrafeedback-binarized-preferences, Intel/orca_dpo_pairs, and a helpfulness-only version of PKU-Alignment/PKU-SafeRLHF
22
+ * TIES merge several checkpoints of servile-harpsichord-cdpo with loyal-piano-m7-cdpo
23
+
24
+ Local benchmarks show the result to be better than any of the individual components. Let's see if that holds up!
25
+
26
+ Trained using the Alpaca prompt format.
27
+
28
+
29
+ Configuration for final merge:
30
+ ```yml
31
+ models:
32
+ - model: chargoddard/loyal-piano-m7-cdpo
33
+ parameters:
34
+ density: 1.0
35
+ weight: 1.0
36
+ - model: /home/ubuntu/servile-harpsichord-cdpo/checkpoint-4186
37
+ parameters:
38
+ weight: 0.1
39
+ - model: /home/ubuntu/servile-harpsichord-cdpo/checkpoint-5796
40
+ parameters:
41
+ weight: 0.2
42
+ - model: /home/ubuntu/servile-harpsichord-cdpo/checkpoint-6118
43
+ parameters:
44
+ weight: 0.3
45
+ - model: /home/ubuntu/servile-harpsichord-cdpo/final
46
+ parameters:
47
+ weight: 0.4
48
+ merge_method: ties
49
+ base_model: mistralai/Mistral-7B-v0.1
50
+ dtype: bfloat16
51
+ parameters:
52
+ density: 0.4
53
+ normalize: true
54
+ int8_mask: true
55
+ ```