bclavie
/

ModernBERT-base-fineweb-edu-example

Text Classification

Inference Endpoints

Model card Files Files and versions Community

bclavie commited on Dec 25, 2024

Commit

2731900

·

verified ·

1 Parent(s): 9bb7a05

Create README.md

Files changed (1) hide show

README.md +85 -0

README.md ADDED Viewed

	@@ -0,0 +1,85 @@

+---
+library_name: transformers
+license: apache-2.0
+base_model: answerdotai/ModernBERT-base
+tags:
+- ModernBERT
+- fineweb
+- filtering
+- regression
+metrics:
+- precision
+- recall
+- accuracy
+model-index:
+- name: 8e-5_one_label
+  results: []
+datasets:
+- HuggingFaceFW/fineweb-edu-llama3-annotations
+language:
+- en
+---
+One-off run using a [modified version](https://gist.github.com/bclavie/93d3b161d7fb41131bca41a50b6726c5) of the original Fineweb-Edu quality filter regression training code, simply replacing the original model (snowflake-embed-m, a model fine-tuned on BERT-base) with ModernBERT-base.
+w/o extensive tuning, the model trains considerably faster than BERT-base, and gets **+5 Weighted F1**:
+## Results
+### ModernBERT-base-fineweb-edu-example
+**Weighted F1: 0.76**
+**Detailed**
+```
+Validation Report:
+              precision    recall  f1-score   support
+           0       0.80      0.55      0.65      5694
+           1       0.82      0.86      0.84     26512
+           2       0.64      0.71      0.67     10322
+           3       0.65      0.60      0.63      3407
+           4       0.80      0.37      0.51       807
+           5       0.00      0.00      0.00         1
+    accuracy                           0.76     46743
+   macro avg       0.62      0.51      0.55     46743
+weighted avg       0.76      0.76      0.76     46743
+```
+### Original Classifier (https://huggingface.co/HuggingFaceFW/fineweb-edu-classifier):
+**Weighted F1: 0.71**
+**Detailed:**
+```
+              precision    recall  f1-score   support
+           0       0.75      0.49      0.59      5694
+           1       0.78      0.84      0.81     26512
+           2       0.57      0.61      0.59     10322
+           3       0.56      0.50      0.53      3407
+           4       0.58      0.35      0.44       807
+           5       0.33      0.01      0.02       125
+    accuracy                           0.71     46867
+   macro avg       0.60      0.47      0.50     46867
+weighted avg       0.71      0.71      0.71     46867
+```
+(for some reason, the currently available annotated dataset is identical, except that it's missing 124 of the 125 5-rated examples. These are so anecdotal they have no real impact on the weighted metrics.)
+## Params
+Most parameters detailed in the script. Key hparams:
+- **Learning Rate**: 5e-5
+- **Weight Decay**: 0.1 (decoupled)
+- **Seed**: 1
+- **Warmup**: 10% steps
+- **Schedule**: Linear decay
+- **Max epochs**: 10
+- **Best Epoch**: #3
+- **Precision**: bfloat16