mwitiderrick commited on
Commit
0a52c68
1 Parent(s): 2bd1d9c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +77 -14
README.md CHANGED
@@ -1,24 +1,87 @@
1
  ---
2
  base_model: teknium/OpenHermes-2.5-Mistral-7B
3
- inference: True
4
  model_type: mistral
 
 
 
 
5
  ---
6
- # OpenHermes-2.5-Mistral-7B
7
- This repo contains pruned model files for [OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B).
 
8
 
9
  This model was pruned with [SparseGPT](https://arxiv.org/abs/2301.00774), using [SparseML](https://github.com/neuralmagic/sparseml).
10
 
 
 
 
 
 
 
11
  ```python
12
- import torch
13
- from transformers import AutoTokenizer, AutoModelForCausalLM
14
-
15
- model_id = "nm-testing/OpenHermes-2.5-Mistral-7B-pruned50-24"
16
- model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", torch_dtype=torch.float16)
17
- tokenizer = AutoTokenizer.from_pretrained(model_id)
18
- inputs = tokenizer("Hello my name is", return_tensors="pt")
19
- outputs = model.generate(**inputs, max_new_tokens=20)
20
- print(tokenizer.batch_decode(outputs)[0])
 
 
 
 
 
 
21
  """
22
- <s> Hello my name is John. I am a 20 year old male. I am a student at a university.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
  """
24
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  base_model: teknium/OpenHermes-2.5-Mistral-7B
3
+ inference: true
4
  model_type: mistral
5
+ quantized_by: mgoin
6
+ tags:
7
+ - nm-vllm
8
+ - sparse
9
  ---
10
+
11
+ ## OpenHermes-2.5-Mistral-7B-pruned50
12
+ This repo contains model files for [OpenHermes-2.5-Mistral-7B](https://huggingface.co/NousResearch/Nous-Hermes-2-Yi-34B) optimized for [NM-vLLM](https://github.com/neuralmagic/nm-vllm), a high-throughput serving engine for compressed LLMs.
13
 
14
  This model was pruned with [SparseGPT](https://arxiv.org/abs/2301.00774), using [SparseML](https://github.com/neuralmagic/sparseml).
15
 
16
+ ## Inference
17
+ Install [NM-vLLM](https://github.com/neuralmagic/nm-vllm) for fast inference and low memory-usage:
18
+ ```bash
19
+ pip install nm-vllm[sparse]
20
+ ```
21
+ Run in a Python pipeline for local inference:
22
  ```python
23
+ from vllm import LLM, SamplingParams
24
+
25
+ model = LLM("nm-testing/OpenHermes-2.5-Mistral-7B-pruned2.4", sparsity="sparse_w16a16")
26
+ prompt = "How to make banana bread?"
27
+ formatted_prompt = f"<|im_start|>user\n{prompt}<|im_end|>\n<|im_start|>assistant"
28
+
29
+ sampling_params = SamplingParams(max_tokens=100)
30
+ outputs = model.generate(formatted_prompt, sampling_params=sampling_params)
31
+ print(outputs[0].outputs[0].text)
32
+ """
33
+ In order to make banana bread, you will need to follow these steps:
34
+
35
+ 1. Prepare the ingredients: You will need flour, sugar, eggs, and bananas.
36
+ 2. Prepare your ingredients: Prepare your bananas, flour, sugar, and eggs by preparing them in their respective bowls, ready to prepare the banana bread.
37
+ 3. Make the batter: You will prepare batter by combining the flour, sugar, eggs and bananas. This
38
  """
39
+ ```
40
+
41
+ ## Prompt template
42
+
43
+ ```
44
+ <|im_start|>user
45
+ {prompt}<|im_end|>
46
+ <|im_start|>assistant
47
+ ```
48
+
49
+ ## Sparsification
50
+ For details on how this model was sparsified, see the `recipe.yaml` in this repo and follow the instructions below.
51
+
52
+ Install [SparseML](https://github.com/neuralmagic/sparseml):
53
+ ```bash
54
+ git clone https://github.com/neuralmagic/sparseml
55
+ pip install -e "sparseml[transformers]"
56
+ ```
57
+
58
+ Replace the recipe as you like and run this one-shot compression script to apply SparseGPT:
59
+ ```python
60
+ import sparseml.transformers
61
+
62
+ original_model_name = "teknium/OpenHermes-2.5-Mistral-7B"
63
+ calibration_dataset = "open_platypus"
64
+ output_directory = "output/"
65
+
66
+ recipe = """
67
+ test_stage:
68
+ obcq_modifiers:
69
+ SparseGPTModifier:
70
+ sparsity: 0.5
71
+ sequential_update: true
72
+ mask_structure: '2:4'
73
+ targets: ['re:model.layers.\d*$']
74
  """
75
+
76
+ # Apply SparseGPT to the model
77
+ sparseml.transformers.oneshot(
78
+ model=original_model_name,
79
+ dataset=calibration_dataset,
80
+ recipe=recipe,
81
+ output_dir=output_directory,
82
+ )
83
+ ```
84
+
85
+ ## Slack
86
+
87
+ For further support, and discussions on these models and AI in general, join [Neural Magic's Slack Community](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ)