add to docs (#703)
Browse files- README.md +2 -0
- docs/faq.md +14 -0
README.md
CHANGED
@@ -901,6 +901,8 @@ CUDA_VISIBLE_DEVICES="" python3 -m axolotl.cli.merge_lora ...
|
|
901 |
|
902 |
## Common Errors 🧰
|
903 |
|
|
|
|
|
904 |
> If you encounter a 'Cuda out of memory' error, it means your GPU ran out of memory during the training process. Here's how to resolve it:
|
905 |
|
906 |
Please reduce any below
|
|
|
901 |
|
902 |
## Common Errors 🧰
|
903 |
|
904 |
+
See also the [FAQ's](./docs/faq.md).
|
905 |
+
|
906 |
> If you encounter a 'Cuda out of memory' error, it means your GPU ran out of memory during the training process. Here's how to resolve it:
|
907 |
|
908 |
Please reduce any below
|
docs/faq.md
ADDED
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Axolotl FAQ's
|
2 |
+
|
3 |
+
|
4 |
+
> The trainer stopped and hasn't progressed in several minutes.
|
5 |
+
|
6 |
+
Usually an issue with the GPU's communicating with each other. See the [NCCL doc](../docs/nccl.md)
|
7 |
+
|
8 |
+
> Exitcode -9
|
9 |
+
|
10 |
+
This usually happens when you run out of system RAM.
|
11 |
+
|
12 |
+
> Exitcode -7 while using deepspeed
|
13 |
+
|
14 |
+
Try upgrading deepspeed w: `pip install -U deepspeed`
|