---
base_model: ai-forever/rugpt3medium_based_on_gpt2
tags:
- generated_from_trainer
model-index:
- name: my_rugpt3medium_finetune
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# my_rugpt3medium_finetune

This model is a fine-tuned version of [ai-forever/rugpt3medium_based_on_gpt2](https://huggingface.co/ai-forever/rugpt3medium_based_on_gpt2) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.9955

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 3
- total_train_batch_size: 24
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 1000
- num_epochs: 35
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:-----:|:----:|:---------------:|
| 3.5373        | 0.46  | 25   | 3.4828          |
| 3.5265        | 0.93  | 50   | 3.4708          |
| 3.478         | 1.39  | 75   | 3.4398          |
| 3.4851        | 1.85  | 100  | 3.3995          |
| 3.4407        | 2.31  | 125  | 3.3609          |
| 3.3731        | 2.78  | 150  | 3.3241          |
| 3.3584        | 3.24  | 175  | 3.2886          |
| 3.3267        | 3.7   | 200  | 3.2540          |
| 3.3043        | 4.17  | 225  | 3.2200          |
| 3.229         | 4.63  | 250  | 3.1853          |
| 3.2618        | 5.09  | 275  | 3.1508          |
| 3.1823        | 5.56  | 300  | 3.1164          |
| 3.172         | 6.02  | 325  | 3.0779          |
| 3.1354        | 6.48  | 350  | 3.0395          |
| 3.0899        | 6.94  | 375  | 2.9987          |
| 3.0741        | 7.41  | 400  | 2.9577          |
| 3.009         | 7.87  | 425  | 2.9140          |
| 2.9598        | 8.33  | 450  | 2.8737          |
| 2.9187        | 8.8   | 475  | 2.8294          |
| 2.9378        | 9.26  | 500  | 2.7842          |
| 2.8396        | 9.72  | 525  | 2.7374          |
| 2.8608        | 10.19 | 550  | 2.6889          |
| 2.7296        | 10.65 | 575  | 2.6405          |
| 2.7452        | 11.11 | 600  | 2.5926          |
| 2.6882        | 11.57 | 625  | 2.5389          |
| 2.6463        | 12.04 | 650  | 2.4893          |
| 2.572         | 12.5  | 675  | 2.4356          |
| 2.5384        | 12.96 | 700  | 2.3788          |
| 2.5246        | 13.43 | 725  | 2.3296          |
| 2.4055        | 13.89 | 750  | 2.2747          |
| 2.3759        | 14.35 | 775  | 2.2155          |
| 2.3351        | 14.81 | 800  | 2.1606          |
| 2.286         | 15.28 | 825  | 2.1061          |
| 2.2694        | 15.74 | 850  | 2.0504          |
| 2.1745        | 16.2  | 875  | 1.9967          |
| 2.1053        | 16.67 | 900  | 1.9411          |
| 2.1184        | 17.13 | 925  | 1.8878          |
| 2.0107        | 17.59 | 950  | 1.8362          |
| 2.027         | 18.06 | 975  | 1.7854          |
| 1.9153        | 18.52 | 1000 | 1.7304          |
| 1.9267        | 18.98 | 1025 | 1.6854          |
| 1.8131        | 19.44 | 1050 | 1.6331          |
| 1.8405        | 19.91 | 1075 | 1.5839          |
| 1.7294        | 20.37 | 1100 | 1.5370          |
| 1.7154        | 20.83 | 1125 | 1.4971          |
| 1.6573        | 21.3  | 1150 | 1.4476          |
| 1.6391        | 21.76 | 1175 | 1.4130          |
| 1.5497        | 22.22 | 1200 | 1.3727          |
| 1.5194        | 22.69 | 1225 | 1.3378          |
| 1.535         | 23.15 | 1250 | 1.3000          |
| 1.4514        | 23.61 | 1275 | 1.2714          |
| 1.4711        | 24.07 | 1300 | 1.2388          |
| 1.4105        | 24.54 | 1325 | 1.2136          |
| 1.4202        | 25.0  | 1350 | 1.1890          |
| 1.3351        | 25.46 | 1375 | 1.1679          |
| 1.3575        | 25.93 | 1400 | 1.1440          |
| 1.2882        | 26.39 | 1425 | 1.1202          |
| 1.3378        | 26.85 | 1450 | 1.1074          |
| 1.3094        | 27.31 | 1475 | 1.0864          |
| 1.2793        | 27.78 | 1500 | 1.0743          |
| 1.2377        | 28.24 | 1525 | 1.0626          |
| 1.2693        | 28.7  | 1550 | 1.0468          |
| 1.2157        | 29.17 | 1575 | 1.0368          |
| 1.2007        | 29.63 | 1600 | 1.0263          |
| 1.2376        | 30.09 | 1625 | 1.0221          |
| 1.2216        | 30.56 | 1650 | 1.0136          |
| 1.1923        | 31.02 | 1675 | 1.0102          |
| 1.2143        | 31.48 | 1700 | 1.0039          |
| 1.1764        | 31.94 | 1725 | 1.0014          |
| 1.1654        | 32.41 | 1750 | 0.9990          |
| 1.2031        | 32.87 | 1775 | 0.9976          |
| 1.1952        | 33.33 | 1800 | 0.9965          |
| 1.1852        | 33.8  | 1825 | 0.9961          |
| 1.1737        | 34.26 | 1850 | 0.9959          |
| 1.1609        | 34.72 | 1875 | 0.9955          |


### Framework versions

- Transformers 4.35.2
- Pytorch 2.1.0+cu121
- Datasets 2.16.0
- Tokenizers 0.15.0