nm-testing
/

Meta-Llama-3-8B-Instruct-FP8-K-V

Text Generation

text-generation-inference

Inference Endpoints

compressed-tensors

Model card Files Files and versions Community

mgoin commited on Aug 28

Commit

9755602

•

1 Parent(s): 3b8f267

Update README.md

Files changed (1) hide show

README.md +29 -0

README.md CHANGED Viewed

@@ -1,6 +1,35 @@
 ```
 lm_eval --model vllm --model_args pretrained=nm-testing/Meta-Llama-3-8B-Instruct-FP8-K-V,kv_cache_dtype=fp8,add_bos_token=True --tasks gsm8k --num_fewshot 5 --batch_size auto

+---
+tags:
+- fp8
+- vllm
+license: llama3
+license_link: https://llama.meta.com/llama3/license/
+language:
+- en
+---
+# Meta-Llama-3-8B-Instruct-FP8
+## Model Overview
+- **Model Architecture:** Meta-Llama-3
+  - **Input:** Text
+  - **Output:** Text
+- **Model Optimizations:**
+  - **Weight quantization:** FP8
+  - **Activation quantization:** FP8
+  - **KV cache quantization:** FP8
+- **Intended Use Cases:** Intended for commercial and research use in English. Similarly to [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct), this models is intended for assistant-like chat.
+- **Out-of-scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in languages other than English.
+- **Release Date:** 6/8/2024
+- **Version:** 1.0
+- **License(s):** [Llama3](https://llama.meta.com/llama3/license/)
+- **Model Developers:** Neural Magic
+Quantized version of [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct).
 ```
 lm_eval --model vllm --model_args pretrained=nm-testing/Meta-Llama-3-8B-Instruct-FP8-K-V,kv_cache_dtype=fp8,add_bos_token=True --tasks gsm8k --num_fewshot 5 --batch_size auto