szymonrucinski commited on
Commit
f15815f
1 Parent(s): 7b7a4e3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -1
README.md CHANGED
@@ -7,7 +7,19 @@ tags:
7
  - polish
8
  - nlp
9
  ---
10
- # Curie-7B-v1 Model
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
  ## Introduction
13
  This research demonstrates the potential of fine-tuning English Large Language Models (LLMs) for Polish text generation. By employing Language Adaptive Pre-training (LAPT) on a high-quality dataset of 3.11 GB (276 million Polish tokens) and subsequent fine-tuning on the [KLEJ challenges](https://klejbenchmark.com), the `Curie-7B-v1` model achieves remarkable performance. It not only generates Polish text with the lowest perplexity of 3.02 among decoder-based models but also rivals the best Polish encoder-decoder models closely, with a minimal performance gap on 8 out of 9 tasks. This was accomplished using about 2-3% of the dataset size typically required, showcasing the method's efficiency. The model is now open-source, contributing to the community's collaborative progress.
 
7
  - polish
8
  - nlp
9
  ---
10
+ <style>
11
+ @import url('https://fonts.googleapis.com/css2?family=Pacifico&display=swap')
12
+ .markdown-custom-font {
13
+ font-family: "Pacifico", cursive;
14
+ font-weight: 400;
15
+ font-style: normal;
16
+ }
17
+ </style>
18
+
19
+ <div class="markdown-custom-font" align="center">
20
+ <img src="logo.png" alt="Logo" width="300">
21
+ Curie-7B-v1
22
+ </div>
23
 
24
  ## Introduction
25
  This research demonstrates the potential of fine-tuning English Large Language Models (LLMs) for Polish text generation. By employing Language Adaptive Pre-training (LAPT) on a high-quality dataset of 3.11 GB (276 million Polish tokens) and subsequent fine-tuning on the [KLEJ challenges](https://klejbenchmark.com), the `Curie-7B-v1` model achieves remarkable performance. It not only generates Polish text with the lowest perplexity of 3.02 among decoder-based models but also rivals the best Polish encoder-decoder models closely, with a minimal performance gap on 8 out of 9 tasks. This was accomplished using about 2-3% of the dataset size typically required, showcasing the method's efficiency. The model is now open-source, contributing to the community's collaborative progress.