Sreenington commited on
Commit
deac860
1 Parent(s): 533b773

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -1
README.md CHANGED
@@ -73,8 +73,33 @@ Assistant:
73
 
74
  ## How to use
75
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
76
  ### take the whole document as context
77
- This can be applied to the scenario where the whole document can be fitted into the model, so that there is no need to run retrieval over the document.
 
78
  ```python
79
  from transformers import AutoTokenizer, AutoModelForCausalLM
80
  import torch
 
73
 
74
  ## How to use
75
 
76
+ ### using vLLM
77
+ ```python
78
+ from vllm import LLM, SamplingParams
79
+
80
+ # Sample prompts.
81
+ prompts = [
82
+ "Hello, how are you?"
83
+ ]
84
+ # Create a sampling params object.
85
+ sampling_params = SamplingParams(max_tokens=128)
86
+
87
+ # Create an LLM.
88
+ llm = LLM(model="Sreenington/Llama-3-8B-ChatQA-AWQ", quantization="AWQ")
89
+ # Generate texts from the prompts. The output is a list of RequestOutput objects
90
+ # that contain the prompt, generated text, and other information.
91
+ outputs = llm.generate(prompts, sampling_params)
92
+
93
+ # Print the outputs.
94
+ for output in outputs:
95
+ prompt = output.prompt
96
+ generated_text = output.outputs[0].text
97
+ print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
98
+ ```
99
+
100
  ### take the whole document as context
101
+ This can be applied to the scenario where the whole document can be fitted into the model, so that there is no need to run retrieval over the document
102
+
103
  ```python
104
  from transformers import AutoTokenizer, AutoModelForCausalLM
105
  import torch