--- license: apache-2.0 library_name: transformers --- # GGUF Models: Conversion and Upload to Hugging Face This guide explains what GGUF models are, how to convert models to GGUF format, and how to upload them to the Hugging Face Hub. ## What is GGUF? GGUF (GGML Unified Format) is a file format for storing large language models, particularly optimized for efficient inference on consumer hardware. Key features of GGUF models include: - Successor to the GGML format - Designed for efficient quantization and inference - Supports a wide range of model architectures - Commonly used with libraries like llama.cpp for running LLMs on consumer hardware - Allows for reduced model size while maintaining good performance ## Why and How to Convert to GGUF Format Converting models to GGUF format offers several advantages: 1. **Reduced file size**: GGUF models can be quantized to lower precision (e.g., int4, int8), significantly reducing model size. 2. **Faster inference**: The format is optimized for quick loading and efficient inference on CPUs and consumer GPUs. 3. **Cross-platform compatibility**: GGUF models can be used with libraries like llama.cpp, enabling deployment on various platforms. To convert a model to GGUF format, we'll use the `convert-hf-to-gguf.py` script from the llama.cpp repository. ### Steps to Convert a Model to GGUF 1. **Clone the llama.cpp repository**: ```bash git clone https://github.com/ggerganov/llama.cpp.git ``` 2. **Install required Python libraries**: ```bash pip install -r llama.cpp/requirements.txt ``` 3. **Verify the script and understand options**: ```bash python llama.cpp/convert-hf-to-gguf-update.py -h ``` 4. **Convert the HuggingFace model to GGUF**: ```bash python llama.cpp/convert-hf-to-gguf-update.py ./models/8B/Meta-Llama-3-8B-Instruct --outfile Llama3-8B-instruct-Q8.0.gguf --outtype q8_0 ``` This command converts the model to 8-bit quantization (q8_0). You can choose different quantization levels like int4, int8, or keep it in f16 or f32 format. ## Uploading GGUF Models to Hugging Face Once you have your GGUF model, you can upload it to Hugging Face for easy sharing and versioning. ### Prerequisites - Python 3.6+ - `huggingface_hub` library installed (`pip install huggingface_hub`) - A Hugging Face account and API token ### Upload Script Save the following script as `upload_gguf_model.py`: ```python from huggingface_hub import HfApi def push_to_hub(hf_token, local_path, model_id): api = HfApi(token=hf_token) api.create_repo(model_id, exist_ok=True, repo_type="model") api.upload_file( path_or_fileobj=local_path, path_in_repo="Meta-Llama-2-7B-Instruct.bf16.gguf", repo_id=model_id ) print(f"Model successfully pushed to {model_id}") # Example usage hf_token = "your_huggingface_token_here" local_path = "/path/to/your/local/model/directory" model_id = "your-username/your-model-name" push_to_hub(hf_token, local_path, model_id) ``` ### Usage 1. Replace the placeholder values in the script: - `your_huggingface_token_here`: Your Hugging Face API token - `/path/to/your/local/model/directory`: The local path to your GGUF model files - `your-username/your-model-name`: Your desired model ID on Hugging Face 2. Run the script: ```bash python upload_gguf_model.py ``` ## Best Practices - Include a `README.md` file with your model, detailing its architecture, quantization, and usage instructions. - Add a `config.json` file with model configuration details. - Include any necessary tokenizer files. ## References 1. [llama.cpp GitHub Repository](https://github.com/ggerganov/llama.cpp) 2. [GGUF Format Discussion](https://github.com/ggerganov/llama.cpp/discussions/2948) 3. [Hugging Face Documentation](https://huggingface.co/docs) For more detailed information and updates, please refer to the official documentation of llama.cpp and Hugging Face.