neuralmagic
/

Llama-2-7b-pruned50-retrained

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

mgoin commited on Mar 15

Commit

64364a5

•

1 Parent(s): 5198e7f

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -18,12 +18,14 @@ This repo contains model files for a [Llama 2 7B](https://huggingface.co/meta-ll
 Below we share some code snippets on how to get quickly started with running the model.
-### Fine-tuning examples
 Coming soon.
 ### Running the model
 ```python
 # pip install transformers accelerate
 from transformers import AutoTokenizer, AutoModelForCausalLM

 Below we share some code snippets on how to get quickly started with running the model.
+### Sparse Fine-tuning examples
 Coming soon.
 ### Running the model
+This model has not been fine-tuned for instruction-following but may be run with the transformers library. For accelerated inference with sparsity, deploy with [nm-vllm](https://github.com/neuralmagic/nm-vllm) or [deepsparse](https://github.com/neuralmagic/deepsparse).
 ```python
 # pip install transformers accelerate
 from transformers import AutoTokenizer, AutoModelForCausalLM