add DeepSpeed as another solution for ths huge model.
Browse files
README.md
CHANGED
@@ -20,7 +20,8 @@ t5 = transformers.T5ForConditionalGeneration.from_pretrained('t5-11b', use_cdn =
|
|
20 |
```
|
21 |
|
22 |
Secondly, a single GPU will most likely not have enough memory to even load the model into memory as the weights alone amount to over 40 GB.
|
23 |
-
Model parallelism has to be used here to overcome this problem as is explained in this [PR](https://github.com/huggingface/transformers/pull/3578).
|
|
|
24 |
|
25 |
## [Google's T5](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html)
|
26 |
|
|
|
20 |
```
|
21 |
|
22 |
Secondly, a single GPU will most likely not have enough memory to even load the model into memory as the weights alone amount to over 40 GB.
|
23 |
+
- Model parallelism has to be used here to overcome this problem as is explained in this [PR](https://github.com/huggingface/transformers/pull/3578).
|
24 |
+
- DeepSpeed's ZeRO-Offload is another approach as explained in this [post](https://github.com/huggingface/transformers/issues/9996).
|
25 |
|
26 |
## [Google's T5](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html)
|
27 |
|