Update README.md
Browse files
README.md
CHANGED
@@ -32,6 +32,35 @@ More information needed
|
|
32 |
|
33 |
## Training procedure
|
34 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
35 |
### Training hyperparameters
|
36 |
|
37 |
The following hyperparameters were used during training:
|
|
|
32 |
|
33 |
## Training procedure
|
34 |
|
35 |
+
Trained on 16 Graphcore Mk2 IPUs using [optimum-graphcore](https://github.com/huggingface/optimum-graphcore).
|
36 |
+
|
37 |
+
Command line:
|
38 |
+
|
39 |
+
```
|
40 |
+
python examples/language-modeling/run_clm.py \
|
41 |
+
--model_name_or_path gpt2 \
|
42 |
+
--ipu_config_name Graphcore/gpt2-small-ipu \
|
43 |
+
--dataset_name wikitext \
|
44 |
+
--dataset_config_name wikitext-103-raw-v1 \
|
45 |
+
--do_train \
|
46 |
+
--do_eval \
|
47 |
+
--num_train_epochs 10 \
|
48 |
+
--dataloader_num_workers 64 \
|
49 |
+
--per_device_train_batch_size 1 \
|
50 |
+
--per_device_eval_batch_size 1 \
|
51 |
+
--gradient_accumulation_steps 128 \
|
52 |
+
--output_dir /tmp/clm_output \
|
53 |
+
--logging_steps 5 \
|
54 |
+
--learning_rate 1e-5 \
|
55 |
+
--lr_scheduler_type linear \
|
56 |
+
--loss_scaling 16384 \
|
57 |
+
--weight_decay 0.01 \
|
58 |
+
--warmup_ratio 0.1 \
|
59 |
+
--ipu_config_overrides="embedding_serialization_factor=4,optimizer_state_offchip=true,inference_device_iterations=5" \
|
60 |
+
--dataloader_drop_last \
|
61 |
+
--pod_type pod16
|
62 |
+
```
|
63 |
+
|
64 |
### Training hyperparameters
|
65 |
|
66 |
The following hyperparameters were used during training:
|