pabloce/Tess-v2.5-Qwen2-72B

This model is a converted version of migtissera/Tess-v2.5-Qwen2-72B in GGUF format.

For more details on the original model, please refer to its model card.

Installation

To use this model with llama.cpp, you can install llama.cpp through brew on Mac and Linux:

brew install llama.cpp

To use the model via the CLI, run the following command:

llama --hf-repo pabloce/Tess-v2.5-Qwen2-72B-gguff --hf-file tess-2.5-qwen-2-70b-q3_k_m.gguf -p "The meaning to life and the universe is"

To start the llama.cpp server with this model, use the following command:

llama-server --hf-repo pabloce/Tess-v2.5-Qwen2-72B-gguff --hf-file tess-2.5-qwen-2-70b-q3_k_m.gguf -c 2048

You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repository.

Clone the llama.cpp repository from GitHub:

git clone https://github.com/ggerganov/llama.cpp

Navigate to the llama.cpp folder and build it with the LLAMA_CURL=1 flag. You can also include other hardware-specific flags (e.g., LLAMA_CUDA=1 for Nvidia GPUs on Linux):
```
cd llama.cpp && LLAMA_CURL=1 make
```

Run inference through the main binary:

./main --hf-repo pabloce/Tess-v2.5-Qwen2-72B-gguf --hf-file tess-2.5-qwen-2-70b-q3_k_m.gguf -p "The meaning to life and the universe is"

or start the server:

./server --hf-repo pabloce/Tess-v2.5-Qwen2-72B-gguf --hf-file tess-2.5-qwen-2-70b-q3_k_m.gguf -c 2048