Update README.md
#2
by
ybelkada
- opened
README.md
CHANGED
@@ -122,11 +122,11 @@ Please see [the BLOOM training README](https://github.com/bigscience-workshop/bi
|
|
122 |
|
123 |
* ALiBI positional encodings (see [paper](https://arxiv.org/pdf/2108.12409.pdf)), with GeLU activation functions
|
124 |
|
125 |
-
*
|
126 |
|
127 |
-
*
|
128 |
|
129 |
-
* Hidden layers are
|
130 |
|
131 |
* Sequence length of 2048 tokens used (see [BLOOM tokenizer](https://huggingface.co/bigscience/tokenizer), [tokenizer description](#tokenization))
|
132 |
|
|
|
122 |
|
123 |
* ALiBI positional encodings (see [paper](https://arxiv.org/pdf/2108.12409.pdf)), with GeLU activation functions
|
124 |
|
125 |
+
* 760 million parameters:
|
126 |
|
127 |
+
* 24 layers, 16 attention heads
|
128 |
|
129 |
+
* Hidden layers are 1536-dimensional
|
130 |
|
131 |
* Sequence length of 2048 tokens used (see [BLOOM tokenizer](https://huggingface.co/bigscience/tokenizer), [tokenizer description](#tokenization))
|
132 |
|