---
license: apache-2.0
base_model: mistralai/Mistral-7B-v0.1
tags:
- generated_from_trainer
datasets:
- generator
model-index:
- name: GEITje-v1-7B
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# GEITje-v1-7B

This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the generator dataset.
It achieves the following results on the evaluation set:
- Loss: 1.3943

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 8
- total_train_batch_size: 128
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 953
- training_steps: 9536

### Training results

| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:-----:|:----:|:---------------:|
| 1.6995        | 0.02  | 199  | 1.7673          |
| 1.6949        | 0.04  | 398  | 1.6880          |
| 1.6377        | 0.06  | 597  | 1.6429          |
| 1.6011        | 0.08  | 796  | 1.6384          |
| 1.5196        | 0.1   | 995  | 1.6060          |
| 1.5158        | 0.13  | 1194 | 1.5832          |
| 1.5181        | 0.15  | 1393 | 1.5541          |
| 1.4931        | 0.17  | 1592 | 1.5493          |
| 1.4972        | 0.19  | 1791 | 1.5407          |
| 1.5349        | 0.21  | 1990 | 1.5305          |
| 1.5025        | 0.23  | 2189 | 1.5263          |
| 1.396         | 0.25  | 2388 | 1.5140          |
| 1.4353        | 0.27  | 2587 | 1.5104          |
| 1.4307        | 0.29  | 2786 | 1.5003          |
| 1.3974        | 0.31  | 2985 | 1.4849          |
| 1.404         | 0.33  | 3184 | 1.4771          |
| 1.4299        | 0.35  | 3383 | 1.4825          |
| 1.4342        | 0.38  | 3582 | 1.4705          |
| 1.4341        | 0.4   | 3781 | 1.4643          |
| 1.4535        | 0.42  | 3980 | 1.4580          |
| 1.4799        | 0.44  | 4179 | 1.4521          |
| 1.35          | 0.46  | 4378 | 1.4478          |
| 1.4586        | 0.48  | 4577 | 1.4425          |
| 1.3685        | 0.5   | 4776 | 1.4368          |
| 1.4572        | 0.52  | 4975 | 1.4313          |
| 1.3293        | 0.54  | 5174 | 1.4265          |
| 1.403         | 0.56  | 5373 | 1.4241          |
| 1.3057        | 0.58  | 5572 | 1.4188          |
| 1.244         | 0.61  | 5771 | 1.4178          |
| 1.3224        | 0.63  | 5970 | 1.4110          |
| 1.3238        | 0.65  | 6169 | 1.4083          |
| 1.3262        | 0.67  | 6368 | 1.4050          |
| 1.3237        | 0.69  | 6567 | 1.4027          |
| 1.0453        | 0.71  | 6766 | 1.4005          |
| 1.3136        | 0.73  | 6965 | 1.3992          |
| 1.3137        | 0.75  | 7164 | 1.3975          |
| 1.1587        | 0.77  | 7363 | 1.3964          |
| 1.316         | 0.79  | 7562 | 1.3957          |
| 1.2738        | 0.81  | 7761 | 1.3951          |
| 1.308         | 0.83  | 7960 | 1.3949          |
| 1.4049        | 0.86  | 8159 | 1.3946          |
| 1.3324        | 0.88  | 8358 | 1.3944          |
| 1.3446        | 0.9   | 8557 | 1.3944          |
| 1.2489        | 0.92  | 8756 | 1.3943          |
| 1.2687        | 0.94  | 8955 | 1.3943          |
| 1.3293        | 0.96  | 9154 | 1.3943          |
| 1.3045        | 0.98  | 9353 | 1.3943          |


### Framework versions

- Transformers 4.36.0.dev0
- Pytorch 2.1.1+cu121
- Datasets 2.15.0
- Tokenizers 0.15.0