Update README.md
Browse files
README.md
CHANGED
@@ -443,6 +443,7 @@ These are the main differences relative to the original T5 architecture:
|
|
443 |
- Shared Input-Output Embeddings
|
444 |
- No biases
|
445 |
- Bidirectional attention
|
|
|
446 |
|
447 |
If you are looking for the language models models, here are the available versions:
|
448 |
- [3B](https://huggingface.co/jbochi/madlad400-3b-mt)
|
|
|
443 |
- Shared Input-Output Embeddings
|
444 |
- No biases
|
445 |
- Bidirectional attention
|
446 |
+
- Layer Norm with `center_scale_at_zero` and final layer with `use_scale=False`
|
447 |
|
448 |
If you are looking for the language models models, here are the available versions:
|
449 |
- [3B](https://huggingface.co/jbochi/madlad400-3b-mt)
|