Update README.md
Browse files
README.md
CHANGED
@@ -36,24 +36,22 @@ inference: false
|
|
36 |
</p>
|
37 |
</blockquote>
|
38 |
|
|
|
39 |
|
40 |
-
|
41 |
-
This model is an adapted version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2), finetuned for Dutch text generation. It was continue-pretrained on 28B Dutch tokens, which includes the full Dutch component of Wikipedia (accounting for around 15%), supplemented with Dutch tokens from CulturaX. A newer version of this dataset can be found [here](https://huggingface.co/datasets/BramVanroy/wikipedia_culturax_dutch), which also describes the filtering that took place.
|
42 |
-
|
43 |
-
## Model description
|
44 |
-
|
45 |
-
More information needed
|
46 |
|
47 |
## Intended uses & limitations
|
48 |
|
49 |
-
|
50 |
|
51 |
-
## Training
|
52 |
|
53 |
-
|
54 |
|
55 |
## Training procedure
|
56 |
|
|
|
|
|
57 |
### Training hyperparameters
|
58 |
|
59 |
The following hyperparameters were used during training:
|
|
|
36 |
</p>
|
37 |
</blockquote>
|
38 |
|
39 |
+
Fietje is an adapated version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2), tailored to Dutch text generation by training on 28B tokens. It is small and efficient with a size of 2.7 billion parameters while performing almost on par with more powerful Dutch LLMs of twice its size like [GEITje 7B Ultra](https://huggingface.co/BramVanroy/GEITje-7B-ultra).
|
40 |
|
41 |
+
A thorough description of the creation and evaluation of Fietje as well as usage examples is available in [this Github repository](https://github.com/BramVanroy/fietje).
|
|
|
|
|
|
|
|
|
|
|
42 |
|
43 |
## Intended uses & limitations
|
44 |
|
45 |
+
The same limitations as [phi-2](https://huggingface.co/microsoft/phi-2#limitations-of-phi-2), and LLMs in general, apply here. LLMs hallucinate, make mistakes, and should not be trusted. Use at your own risk!
|
46 |
|
47 |
+
## Training data
|
48 |
|
49 |
+
Fietje was continue-pretrained on 28B Dutch tokens, which includes the full Dutch component of Wikipedia (accounting for around 15%), supplemented with Dutch tokens from CulturaX. A newer version of this dataset can be found [here](https://huggingface.co/datasets/BramVanroy/wikipedia_culturax_dutch), which also describes the filtering that took place to ensure high data quality.
|
50 |
|
51 |
## Training procedure
|
52 |
|
53 |
+
I am thankful to the [Flemish Supercomputer Center](https://www.vscentrum.be/) (VSC) for providing the computational power to accomplish this project. Accounting for waiting for jobs, training took around two weeks on four nodes of 4x A100 80GB each (16 total).
|
54 |
+
|
55 |
### Training hyperparameters
|
56 |
|
57 |
The following hyperparameters were used during training:
|