Update README.md
Browse files
README.md
CHANGED
@@ -161,9 +161,9 @@ Some instruction datasets are added for curiosity sake although model is not tra
|
|
161 |
- The model does not perform well enough to tell educational value in instruction datasets.
|
162 |
|
163 |
# 📈Analysis
|
164 |
-
## 🤖Model
|
165 |
The expectation is that the model trained with filter will outperform model trained without the filter.
|
166 |
-
Fineweb is filtered on the fly with Educational Value >= 1.
|
167 |
|
168 |
Test 1:
|
169 |
Model params: 192M
|
@@ -178,7 +178,7 @@ Training token: 3.1B training token, 6000 global steps
|
|
178 |
|TruthfulQA| 45.88 | 45.20| 45.97|
|
179 |
|Winogrande| 49.49 | 50.59 | 50.67 |
|
180 |
|
181 |
-
The reasoning and commensense reasoning seems to be better when
|
182 |
MMLU is better also; however it is close to random due to limitation in compute (both training time and model size).
|
183 |
Model of larger size will be trained to further validate this claim.
|
184 |
|
@@ -192,3 +192,6 @@ The first 10M records have been analysed. Full file in [here](https://drive.goo
|
|
192 |
Below is the top 100 domain names, with no of record >= 100.
|
193 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/60e50ce5350d181892d5a636/3QNYYVbFIqaAUh-574lED.png)
|
194 |
|
|
|
|
|
|
|
|
161 |
- The model does not perform well enough to tell educational value in instruction datasets.
|
162 |
|
163 |
# 📈Analysis
|
164 |
+
## 🤖Model Training With And Without Classifier
|
165 |
The expectation is that the model trained with filter will outperform model trained without the filter.
|
166 |
+
Fineweb is filtered on the fly with Educational Value >= 1.0.
|
167 |
|
168 |
Test 1:
|
169 |
Model params: 192M
|
|
|
178 |
|TruthfulQA| 45.88 | 45.20| 45.97|
|
179 |
|Winogrande| 49.49 | 50.59 | 50.67 |
|
180 |
|
181 |
+
The reasoning and commensense reasoning seems to be better when filter is on, aligning with expectation. It is also close to Cosmopedia.
|
182 |
MMLU is better also; however it is close to random due to limitation in compute (both training time and model size).
|
183 |
Model of larger size will be trained to further validate this claim.
|
184 |
|
|
|
192 |
Below is the top 100 domain names, with no of record >= 100.
|
193 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/60e50ce5350d181892d5a636/3QNYYVbFIqaAUh-574lED.png)
|
194 |
|
195 |
+
## 🧪Classifier Ranking Ordering
|
196 |
+
Spearman rank-order correlation coefficient between Educational Value and that of test data is 0.7055, indicating a strong monotonic relationship. The Educational Value can be used for ranking.
|
197 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/60e50ce5350d181892d5a636/dKV2oXRv3WpEsfDXy0bl7.png)
|