Update README.md
Browse files
README.md
CHANGED
@@ -83,6 +83,29 @@ The following hyperparameters were used during QA tuning:
|
|
83 |
- num_epochs: 2.0
|
84 |
- weight_decay: 0.0
|
85 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
86 |
## Example output
|
87 |
|
88 |
**User:**
|
|
|
83 |
- num_epochs: 2.0
|
84 |
- weight_decay: 0.0
|
85 |
|
86 |
+
## Versions
|
87 |
+
|
88 |
+
This repository contains:
|
89 |
+
- pytorch_model.bin: standard version (bfloat16)
|
90 |
+
- model.safetensors: same as pytorch_mode.bin but in safetensors format
|
91 |
+
- gptq_model-8bit-128g.safetensors: 8-bit quantized version for inference speedup and low-VRAM GPUs
|
92 |
+
- gptq_model-4bit-128g.safetensors: 4-bit quantized version for even faster inference, lower VRAM requirements, lower quality
|
93 |
+
|
94 |
+
When using one of the quantized versions, make sure to pass the quantization configuration:
|
95 |
+
```json
|
96 |
+
{
|
97 |
+
"bits": <4 or 8 depending on the version>,
|
98 |
+
"group_size": 128,
|
99 |
+
"damp_percent": 0.01,
|
100 |
+
"desc_act": false,
|
101 |
+
"static_groups": false,
|
102 |
+
"sym": true,
|
103 |
+
"true_sequential": true,
|
104 |
+
"model_name_or_path": null,
|
105 |
+
"model_file_base_name": null
|
106 |
+
}
|
107 |
+
```
|
108 |
+
|
109 |
## Example output
|
110 |
|
111 |
**User:**
|