mgoin commited on
Commit
9755602
1 Parent(s): 3b8f267

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -0
README.md CHANGED
@@ -1,6 +1,35 @@
 
 
 
 
 
 
 
 
 
1
 
2
 
3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  ```
5
  lm_eval --model vllm --model_args pretrained=nm-testing/Meta-Llama-3-8B-Instruct-FP8-K-V,kv_cache_dtype=fp8,add_bos_token=True --tasks gsm8k --num_fewshot 5 --batch_size auto
6
 
 
1
+ ---
2
+ tags:
3
+ - fp8
4
+ - vllm
5
+ license: llama3
6
+ license_link: https://llama.meta.com/llama3/license/
7
+ language:
8
+ - en
9
+ ---
10
 
11
 
12
 
13
+ # Meta-Llama-3-8B-Instruct-FP8
14
+
15
+ ## Model Overview
16
+ - **Model Architecture:** Meta-Llama-3
17
+ - **Input:** Text
18
+ - **Output:** Text
19
+ - **Model Optimizations:**
20
+ - **Weight quantization:** FP8
21
+ - **Activation quantization:** FP8
22
+ - **KV cache quantization:** FP8
23
+ - **Intended Use Cases:** Intended for commercial and research use in English. Similarly to [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct), this models is intended for assistant-like chat.
24
+ - **Out-of-scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in languages other than English.
25
+ - **Release Date:** 6/8/2024
26
+ - **Version:** 1.0
27
+ - **License(s):** [Llama3](https://llama.meta.com/llama3/license/)
28
+ - **Model Developers:** Neural Magic
29
+
30
+ Quantized version of [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct).
31
+
32
+
33
  ```
34
  lm_eval --model vllm --model_args pretrained=nm-testing/Meta-Llama-3-8B-Instruct-FP8-K-V,kv_cache_dtype=fp8,add_bos_token=True --tasks gsm8k --num_fewshot 5 --batch_size auto
35