bclavie commited on
Commit
2731900
·
verified ·
1 Parent(s): 9bb7a05

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +85 -0
README.md ADDED
@@ -0,0 +1,85 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ base_model: answerdotai/ModernBERT-base
5
+ tags:
6
+ - ModernBERT
7
+ - fineweb
8
+ - filtering
9
+ - regression
10
+ metrics:
11
+ - precision
12
+ - recall
13
+ - accuracy
14
+ model-index:
15
+ - name: 8e-5_one_label
16
+ results: []
17
+ datasets:
18
+ - HuggingFaceFW/fineweb-edu-llama3-annotations
19
+ language:
20
+ - en
21
+ ---
22
+
23
+ One-off run using a [modified version](https://gist.github.com/bclavie/93d3b161d7fb41131bca41a50b6726c5) of the original Fineweb-Edu quality filter regression training code, simply replacing the original model (snowflake-embed-m, a model fine-tuned on BERT-base) with ModernBERT-base.
24
+
25
+ w/o extensive tuning, the model trains considerably faster than BERT-base, and gets **+5 Weighted F1**:
26
+
27
+ ## Results
28
+
29
+ ### ModernBERT-base-fineweb-edu-example
30
+
31
+ **Weighted F1: 0.76**
32
+
33
+ **Detailed**
34
+
35
+ ```
36
+ Validation Report:
37
+ precision recall f1-score support
38
+
39
+ 0 0.80 0.55 0.65 5694
40
+ 1 0.82 0.86 0.84 26512
41
+ 2 0.64 0.71 0.67 10322
42
+ 3 0.65 0.60 0.63 3407
43
+ 4 0.80 0.37 0.51 807
44
+ 5 0.00 0.00 0.00 1
45
+
46
+ accuracy 0.76 46743
47
+ macro avg 0.62 0.51 0.55 46743
48
+ weighted avg 0.76 0.76 0.76 46743
49
+ ```
50
+
51
+ ### Original Classifier (https://huggingface.co/HuggingFaceFW/fineweb-edu-classifier):
52
+
53
+ **Weighted F1: 0.71**
54
+
55
+ **Detailed:**
56
+
57
+ ```
58
+ precision recall f1-score support
59
+
60
+ 0 0.75 0.49 0.59 5694
61
+ 1 0.78 0.84 0.81 26512
62
+ 2 0.57 0.61 0.59 10322
63
+ 3 0.56 0.50 0.53 3407
64
+ 4 0.58 0.35 0.44 807
65
+ 5 0.33 0.01 0.02 125
66
+
67
+ accuracy 0.71 46867
68
+ macro avg 0.60 0.47 0.50 46867
69
+ weighted avg 0.71 0.71 0.71 46867
70
+ ```
71
+
72
+ (for some reason, the currently available annotated dataset is identical, except that it's missing 124 of the 125 5-rated examples. These are so anecdotal they have no real impact on the weighted metrics.)
73
+
74
+ ## Params
75
+
76
+ Most parameters detailed in the script. Key hparams:
77
+
78
+ - **Learning Rate**: 5e-5
79
+ - **Weight Decay**: 0.1 (decoupled)
80
+ - **Seed**: 1
81
+ - **Warmup**: 10% steps
82
+ - **Schedule**: Linear decay
83
+ - **Max epochs**: 10
84
+ - **Best Epoch**: #3
85
+ - **Precision**: bfloat16