ybelkada
commited on
Commit
•
df69d83
1
Parent(s):
0a1d0a4
Update README.md (#1)
Browse files
README.md
CHANGED
@@ -122,11 +122,11 @@ Please see [the BLOOM training README](https://github.com/bigscience-workshop/bi
|
|
122 |
|
123 |
* ALiBI positional encodings (see [paper](https://arxiv.org/pdf/2108.12409.pdf)), with GeLU activation functions
|
124 |
|
125 |
-
*
|
126 |
|
127 |
-
*
|
128 |
|
129 |
-
* Hidden layers are
|
130 |
|
131 |
* Sequence length of 2048 tokens used (see [BLOOM tokenizer](https://huggingface.co/bigscience/tokenizer), [tokenizer description](#tokenization))
|
132 |
|
|
|
122 |
|
123 |
* ALiBI positional encodings (see [paper](https://arxiv.org/pdf/2108.12409.pdf)), with GeLU activation functions
|
124 |
|
125 |
+
* 2.5 billion parameters:
|
126 |
|
127 |
+
* 30 layers, 32 attention heads
|
128 |
|
129 |
+
* Hidden layers are 2560-dimensional
|
130 |
|
131 |
* Sequence length of 2048 tokens used (see [BLOOM tokenizer](https://huggingface.co/bigscience/tokenizer), [tokenizer description](#tokenization))
|
132 |
|