NbAiLab
/

nb-llama-3.1-8B

@@ -85,7 +85,7 @@ Please note tht this is still a research project, and the purpose of releasing t
 ```python
 import transformers
-model_id = "north/nb-llama-3.1-8B"
 pipeline = transformers.pipeline(
     "text-generation",
@@ -116,7 +116,7 @@ Parts of the following publicly available datasets were used:
 ### Data Selection
-To ensure the highest quality training data, only a small subset of the original raw data was used. An encoder-only classifier built on [nb-bert-base](https://huggingface.co/NbAiLab/nb-bert-base) was trained to evaluate both educational value and linguistic quality of the training samples.
 - **Categorization Methods:**
   - Inspired by the [FineWeb](https://example.com/FineWeb) project.
@@ -136,4 +136,4 @@ The model is released under the [Llama 3.1 Community License](https://github.com
 ---
 ### Citing & Authors
-The model was trained and documentation written by Per Egil Kummervold.

 ```python
 import transformers
+model_id = "NbAiLab/nb-llama-3.1-8B"
 pipeline = transformers.pipeline(
     "text-generation",
 ### Data Selection
+To ensure the highest quality training data, only a small subset of the original raw data was used. [Corpus Quality Classifiers](https://huggingface.co/collections/NbAiLab/corpus-quality-classifier-673f15926c2774fcc88f23aa) built on [nb-bert-base](https://huggingface.co/NbAiLab/nb-bert-base) were trained to evaluate both educational value and linguistic quality of the training samples. These models are released along with the NB-Llama-3.x models, and are considered the main output from this initiative.
 - **Categorization Methods:**
   - Inspired by the [FineWeb](https://example.com/FineWeb) project.
 ---
 ### Citing & Authors
+The model was trained and documentation written by Per Egil Kummervold as part of the NoTraM-project.