Load model text-generation-webui issues
Running into an issue while using Runpod with a A100. After downloading the model I get this error message for all versions of the model (both Qn_0 and Qn_k).
You mentioned that you got it working on a single A100, did you need to do any extra steps to get the text-generation-webui working with Mixtral models?
Traceback (most recent call last):
File "/workspace/text-generation-webui/modules/ui_model_menu.py", line 209, in load_model_wrapper
shared.model, shared.tokenizer = load_model(selected_model, loader)
File "/workspace/text-generation-webui/modules/models.py", line 89, in load_model
output = load_func_map[loader](model_name)
File "/workspace/text-generation-webui/modules/models.py", line 259, in llamacpp_loader
model, tokenizer = LlamaCppModel.from_pretrained(model_file)
File "/workspace/text-generation-webui/modules/llamacpp_model.py", line 91, in from_pretrained
result.model = Llama(**params)
File "/usr/local/lib/python3.10/dist-packages/llama_cpp_cuda/llama.py", line 923, in init
self._n_vocab = self.n_vocab()
File "/usr/local/lib/python3.10/dist-packages/llama_cpp_cuda/llama.py", line 2184, in n_vocab
return self._model.n_vocab()
File "/usr/local/lib/python3.10/dist-packages/llama_cpp_cuda/llama.py", line 250, in n_vocab
assert self.model is not None
AssertionError
You need to update Transformers on Runpod before launching it, I followed this tutorial : https://youtu.be/WjiX3lCnwUI?si=RnhYQR4eWWfeXCms&t=560
4x13B work on a single A100 using 96% of GPU with FP16, so use this.
For GGUF, I think last Ooba update work, with the last llama.cpp release, but I don't use GGUF in Ooba. Sorry!
tl;dr : If you use an A100 of runpod, use the unquantized files, it work!
Awesome! Thank you! Love the work you have been doing!