nikitastheo
commited on
Commit
•
fdfc08b
1
Parent(s):
cf41bd3
Update README.md
Browse files
README.md
CHANGED
@@ -9,4 +9,6 @@ This model uses the LTG-BERT architecture.
|
|
9 |
The model was trained on a combination of the BabyLM Dataset, the TinyStories Dataset, and generated data,
|
10 |
in accordance with the rules of the Stric-Small track, and the 10M word budget.
|
11 |
|
|
|
|
|
12 |
Hyperparameters used and evaluation scores will follow in a subsequent update.
|
|
|
9 |
The model was trained on a combination of the BabyLM Dataset, the TinyStories Dataset, and generated data,
|
10 |
in accordance with the rules of the Stric-Small track, and the 10M word budget.
|
11 |
|
12 |
+
The model was trained with 128 token sequence length
|
13 |
+
|
14 |
Hyperparameters used and evaluation scores will follow in a subsequent update.
|