Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,85 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
library_name: transformers
|
3 |
+
license: apache-2.0
|
4 |
+
base_model: answerdotai/ModernBERT-base
|
5 |
+
tags:
|
6 |
+
- ModernBERT
|
7 |
+
- fineweb
|
8 |
+
- filtering
|
9 |
+
- regression
|
10 |
+
metrics:
|
11 |
+
- precision
|
12 |
+
- recall
|
13 |
+
- accuracy
|
14 |
+
model-index:
|
15 |
+
- name: 8e-5_one_label
|
16 |
+
results: []
|
17 |
+
datasets:
|
18 |
+
- HuggingFaceFW/fineweb-edu-llama3-annotations
|
19 |
+
language:
|
20 |
+
- en
|
21 |
+
---
|
22 |
+
|
23 |
+
One-off run using a [modified version](https://gist.github.com/bclavie/93d3b161d7fb41131bca41a50b6726c5) of the original Fineweb-Edu quality filter regression training code, simply replacing the original model (snowflake-embed-m, a model fine-tuned on BERT-base) with ModernBERT-base.
|
24 |
+
|
25 |
+
w/o extensive tuning, the model trains considerably faster than BERT-base, and gets **+5 Weighted F1**:
|
26 |
+
|
27 |
+
## Results
|
28 |
+
|
29 |
+
### ModernBERT-base-fineweb-edu-example
|
30 |
+
|
31 |
+
**Weighted F1: 0.76**
|
32 |
+
|
33 |
+
**Detailed**
|
34 |
+
|
35 |
+
```
|
36 |
+
Validation Report:
|
37 |
+
precision recall f1-score support
|
38 |
+
|
39 |
+
0 0.80 0.55 0.65 5694
|
40 |
+
1 0.82 0.86 0.84 26512
|
41 |
+
2 0.64 0.71 0.67 10322
|
42 |
+
3 0.65 0.60 0.63 3407
|
43 |
+
4 0.80 0.37 0.51 807
|
44 |
+
5 0.00 0.00 0.00 1
|
45 |
+
|
46 |
+
accuracy 0.76 46743
|
47 |
+
macro avg 0.62 0.51 0.55 46743
|
48 |
+
weighted avg 0.76 0.76 0.76 46743
|
49 |
+
```
|
50 |
+
|
51 |
+
### Original Classifier (https://huggingface.co/HuggingFaceFW/fineweb-edu-classifier):
|
52 |
+
|
53 |
+
**Weighted F1: 0.71**
|
54 |
+
|
55 |
+
**Detailed:**
|
56 |
+
|
57 |
+
```
|
58 |
+
precision recall f1-score support
|
59 |
+
|
60 |
+
0 0.75 0.49 0.59 5694
|
61 |
+
1 0.78 0.84 0.81 26512
|
62 |
+
2 0.57 0.61 0.59 10322
|
63 |
+
3 0.56 0.50 0.53 3407
|
64 |
+
4 0.58 0.35 0.44 807
|
65 |
+
5 0.33 0.01 0.02 125
|
66 |
+
|
67 |
+
accuracy 0.71 46867
|
68 |
+
macro avg 0.60 0.47 0.50 46867
|
69 |
+
weighted avg 0.71 0.71 0.71 46867
|
70 |
+
```
|
71 |
+
|
72 |
+
(for some reason, the currently available annotated dataset is identical, except that it's missing 124 of the 125 5-rated examples. These are so anecdotal they have no real impact on the weighted metrics.)
|
73 |
+
|
74 |
+
## Params
|
75 |
+
|
76 |
+
Most parameters detailed in the script. Key hparams:
|
77 |
+
|
78 |
+
- **Learning Rate**: 5e-5
|
79 |
+
- **Weight Decay**: 0.1 (decoupled)
|
80 |
+
- **Seed**: 1
|
81 |
+
- **Warmup**: 10% steps
|
82 |
+
- **Schedule**: Linear decay
|
83 |
+
- **Max epochs**: 10
|
84 |
+
- **Best Epoch**: #3
|
85 |
+
- **Precision**: bfloat16
|