Update model description
Browse files
README.md
CHANGED
@@ -6,6 +6,11 @@ This model contains just the `IPUConfig` files for running the [gpt2-medium](htt
|
|
6 |
|
7 |
**This model contains no model weights, only an IPUConfig.**
|
8 |
|
|
|
|
|
|
|
|
|
|
|
9 |
## Usage
|
10 |
|
11 |
```
|
|
|
6 |
|
7 |
**This model contains no model weights, only an IPUConfig.**
|
8 |
|
9 |
+
## Model description
|
10 |
+
GPT2 is a large transformer-based language model. It is built using transformer decoder blocks. BERT, on the other hand, uses transformer encoder blocks. It adds Layer normalisation to the input of each sub-block, similar to a pre-activation residual networks and an additional layer normalisation.
|
11 |
+
|
12 |
+
Paper link : [Language Models are Unsupervised Multitask Learners](https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf)
|
13 |
+
|
14 |
## Usage
|
15 |
|
16 |
```
|