File size: 2,205 Bytes
bc2e15e 94fdf3e bc2e15e 94fdf3e bc2e15e f149d38 bc2e15e 94fdf3e bc2e15e 94fdf3e bc2e15e 94fdf3e bc2e15e 94fdf3e bc2e15e 94fdf3e bc2e15e f149d38 bc2e15e 94fdf3e bc2e15e f149d38 94fdf3e bc2e15e 94fdf3e bc2e15e 94fdf3e bc2e15e f149d38 bc2e15e f149d38 bc2e15e 94fdf3e bc2e15e 94fdf3e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
---
license: bigscience-bloom-rail-1.0
base_model: bigscience/bloom-1b7
tags:
- generated_from_trainer
model-index:
- name: Bloom-1b7-creative-writing-IT
results: []
---
# Bloom-1b7-creative-writing-IT
This model is a fine-tuned version of [bigscience/bloom-1b7](https://huggingface.co/bigscience/bloom-1b7) on an a creative writing - short story dataset.
https://huggingface.co/datasets/adambjorn/UnrelatedForgettingOverhead/viewer/creative
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
Training and evaluation data here: https://huggingface.co/datasets/adambjorn/UnrelatedForgettingOverhead/viewer/creative
## Training procedure
The model was instruction tuned on the dataset in the following way:
Given the set of promts:
prompts = [
"Write a creative short story based on the following title:",
"Here is a title for a story. Craft a short narrative around it:",
"Using the title given, develop a short story:",
"Imagine a short story that starts with this title:",
"Create a brief story with the following title:"
],
each training example is generated by concatenating one of the prompts with the 'title' and 'selftext' in the following way:
concatenated_texts = [random.choice(prompts) + " " + title + "</s>" + "Story: " + selftext for title, selftext in zip(titles, selftexts)]
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 1
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 10
- mixed_precision_training: Native AMP
### Training results
Final reported loss: {'loss': 0.0135, 'grad_norm': 0.6041152477264404, 'learning_rate': 7.446808510638299e-07, 'epoch': 9.89}
Average over tuning: {'train_runtime': 1111.4187, 'train_samples_per_second': 1.71, 'train_steps_per_second': 0.423, 'train_loss': 0.4682149670225509, 'epoch': 9.89}
### Framework versions
- Transformers 4.38.1
- Pytorch 2.2.0+cu121
- Datasets 2.17.0
- Tokenizers 0.15.2
|