mgoin
/

Minitron-8B-Base-FP8

Text Generation

Inference Endpoints

Model card Files Files and versions Community

mgoin commited on Jul 23

Commit

4851278

•

1 Parent(s): 79684c4

Update README.md

Files changed (1) hide show

README.md +11 -1

README.md CHANGED Viewed

@@ -1,3 +1,8 @@
 This quantized model:
 ```
@@ -19,4 +24,9 @@ vllm (pretrained=nvidia/Minitron-8B-Base), gen_kwargs: (None), limit: None, num_
 |-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
 |gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.5080|±  |0.0138|
 |     |       |strict-match    |     5|exact_match|↑  |0.5064|±  |0.0138|
-```

+---
+tags:
+- fp8
+- vllm
+---
 This quantized model:
 ```
 |-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
 |gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.5080|±  |0.0138|
 |     |       |strict-match    |     5|exact_match|↑  |0.5064|±  |0.0138|
+```
+The [original paper](https://arxiv.org/pdf/2407.14679) evals:
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/60466e4b4f40b01b66151416/YFmlifuYBVtdfsdPVgV4u.png)