mo137 commited on
Commit
7c54806
β€’
1 Parent(s): 2c89bdc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -5,16 +5,16 @@ EXL2 quants of [gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it)
5
 
6
  My quants are meant to be a tight fit in 24 GB VRAM. Following VRAM usage numbers assume **8k context**.
7
 
8
- bpw|head|4 bit cache|16 bit cache
9
- --:|--:|--:|--:
10
- πŸ‘‰ [**5.8**](https://huggingface.co/mo137/gemma-2-27b-it-exl2/tree/5.8_h8)|8 bit|21.85 GB|**23.69 GB**
11
- πŸ‘‰ [**6.5**](https://huggingface.co/mo137/gemma-2-27b-it-exl2/tree/6.5_h8)|8 bit|**23.81 GB**|25.65 GB
12
- πŸ‘‰ [**6.6**](https://huggingface.co/mo137/gemma-2-27b-it-exl2/tree/6.6_h6)|**6 bit**|**23.86 GB**|25.70 GB
13
 
14
  For this model the difference between 6 bit and 8 bit head is ~300 MB, it's not huge. It could be exchanged for about 0.1 bpw in the body.
15
 
16
  ---
17
- Check out turboderp's quants & `measurement.json`:
18
  [3.00 bits per weight](https://huggingface.co/turboderp/gemma-2-27b-it-exl2/tree/3.0bpw)
19
  [3.50 bits per weight](https://huggingface.co/turboderp/gemma-2-27b-it-exl2/tree/3.5bpw)
20
  [4.00 bits per weight](https://huggingface.co/turboderp/gemma-2-27b-it-exl2/tree/4.0bpw)
 
5
 
6
  My quants are meant to be a tight fit in 24 GB VRAM. Following VRAM usage numbers assume **8k context**.
7
 
8
+ bpw|head|4 bit cache|16 bit cache|Notes
9
+ --:|--:|--:|--:|:--
10
+ [**5.8**](https://huggingface.co/mo137/gemma-2-27b-it-exl2/tree/5.8_h8)|**8 bit**|21.85 GB|**23.69 GB**|16 bit cache, but lower BPW
11
+ πŸ‘‰ [**6.5**](https://huggingface.co/mo137/gemma-2-27b-it-exl2/tree/6.5_h8)|**8 bit**|**23.81 GB**|25.65 GB|πŸ‘ˆ my recommendation
12
+ [**6.6**](https://huggingface.co/mo137/gemma-2-27b-it-exl2/tree/6.6_h6)|6 bit|**23.86 GB**|25.70 GB|slightly higher BPW, but less precise head
13
 
14
  For this model the difference between 6 bit and 8 bit head is ~300 MB, it's not huge. It could be exchanged for about 0.1 bpw in the body.
15
 
16
  ---
17
+ Check out turboderp's quants & measurement.json:
18
  [3.00 bits per weight](https://huggingface.co/turboderp/gemma-2-27b-it-exl2/tree/3.0bpw)
19
  [3.50 bits per weight](https://huggingface.co/turboderp/gemma-2-27b-it-exl2/tree/3.5bpw)
20
  [4.00 bits per weight](https://huggingface.co/turboderp/gemma-2-27b-it-exl2/tree/4.0bpw)