File size: 2,205 Bytes
bc2e15e
94fdf3e
 
 
 
 
 
 
bc2e15e
 
 
94fdf3e
bc2e15e
f149d38
 
 
bc2e15e
94fdf3e
bc2e15e
94fdf3e
bc2e15e
94fdf3e
bc2e15e
94fdf3e
bc2e15e
94fdf3e
bc2e15e
f149d38
bc2e15e
94fdf3e
bc2e15e
f149d38
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
94fdf3e
bc2e15e
94fdf3e
 
 
 
 
 
 
 
 
 
 
bc2e15e
94fdf3e
bc2e15e
f149d38
bc2e15e
f149d38
bc2e15e
94fdf3e
bc2e15e
94fdf3e
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
---
license: bigscience-bloom-rail-1.0
base_model: bigscience/bloom-1b7
tags:
- generated_from_trainer
model-index:
- name: Bloom-1b7-creative-writing-IT
  results: []
---


# Bloom-1b7-creative-writing-IT

This model is a fine-tuned version of [bigscience/bloom-1b7](https://huggingface.co/bigscience/bloom-1b7) on an a creative writing - short story dataset.

https://huggingface.co/datasets/adambjorn/UnrelatedForgettingOverhead/viewer/creative

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

Training and evaluation data here: https://huggingface.co/datasets/adambjorn/UnrelatedForgettingOverhead/viewer/creative

## Training procedure

The model was instruction tuned on the dataset in the following way:

Given the set of promts: 

prompts = [
    "Write a creative short story based on the following title:",
    "Here is a title for a story. Craft a short narrative around it:",
    "Using the title given, develop a short story:",
    "Imagine a short story that starts with this title:",
    "Create a brief story with the following title:"
],

each training example is generated by concatenating one of the prompts with the 'title' and 'selftext' in the following way:

    concatenated_texts = [random.choice(prompts) + " " + title + "</s>" + "Story: " + selftext for title, selftext in zip(titles, selftexts)]

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 1
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 10
- mixed_precision_training: Native AMP

### Training results

Final reported loss: {'loss': 0.0135, 'grad_norm': 0.6041152477264404, 'learning_rate': 7.446808510638299e-07, 'epoch': 9.89}

Average over tuning: {'train_runtime': 1111.4187, 'train_samples_per_second': 1.71, 'train_steps_per_second': 0.423, 'train_loss': 0.4682149670225509, 'epoch': 9.89}

### Framework versions

- Transformers 4.38.1
- Pytorch 2.2.0+cu121
- Datasets 2.17.0
- Tokenizers 0.15.2