szymonrucinski
/

Curie-7B-v1

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

szymonrucinski commited on Feb 1

Commit

f15815f

•

1 Parent(s): 7b7a4e3

Update README.md

Files changed (1) hide show

README.md +13 -1

README.md CHANGED Viewed

@@ -7,7 +7,19 @@ tags:
 - polish
 - nlp
 ---
-# Curie-7B-v1 Model
 ## Introduction
 This research demonstrates the potential of fine-tuning English Large Language Models (LLMs) for Polish text generation. By employing Language Adaptive Pre-training (LAPT) on a high-quality dataset of 3.11 GB (276 million Polish tokens) and subsequent fine-tuning on the [KLEJ challenges](https://klejbenchmark.com), the `Curie-7B-v1` model achieves remarkable performance. It not only generates Polish text with the lowest perplexity of 3.02 among decoder-based models but also rivals the best Polish encoder-decoder models closely, with a minimal performance gap on 8 out of 9 tasks. This was accomplished using about 2-3% of the dataset size typically required, showcasing the method's efficiency. The model is now open-source, contributing to the community's collaborative progress.

 - polish
 - nlp
 ---
+<style>
+@import url('https://fonts.googleapis.com/css2?family=Pacifico&display=swap')
+.markdown-custom-font {
+  font-family: "Pacifico", cursive;
+  font-weight: 400;
+  font-style: normal;
+}
+</style>
+<div class="markdown-custom-font" align="center">
+  <img src="logo.png" alt="Logo" width="300">
+    Curie-7B-v1
+</div>
 ## Introduction
 This research demonstrates the potential of fine-tuning English Large Language Models (LLMs) for Polish text generation. By employing Language Adaptive Pre-training (LAPT) on a high-quality dataset of 3.11 GB (276 million Polish tokens) and subsequent fine-tuning on the [KLEJ challenges](https://klejbenchmark.com), the `Curie-7B-v1` model achieves remarkable performance. It not only generates Polish text with the lowest perplexity of 3.02 among decoder-based models but also rivals the best Polish encoder-decoder models closely, with a minimal performance gap on 8 out of 9 tasks. This was accomplished using about 2-3% of the dataset size typically required, showcasing the method's efficiency. The model is now open-source, contributing to the community's collaborative progress.