Sreenington
commited on
Commit
•
deac860
1
Parent(s):
533b773
Update README.md
Browse files
README.md
CHANGED
@@ -73,8 +73,33 @@ Assistant:
|
|
73 |
|
74 |
## How to use
|
75 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
76 |
### take the whole document as context
|
77 |
-
This can be applied to the scenario where the whole document can be fitted into the model, so that there is no need to run retrieval over the document
|
|
|
78 |
```python
|
79 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
80 |
import torch
|
|
|
73 |
|
74 |
## How to use
|
75 |
|
76 |
+
### using vLLM
|
77 |
+
```python
|
78 |
+
from vllm import LLM, SamplingParams
|
79 |
+
|
80 |
+
# Sample prompts.
|
81 |
+
prompts = [
|
82 |
+
"Hello, how are you?"
|
83 |
+
]
|
84 |
+
# Create a sampling params object.
|
85 |
+
sampling_params = SamplingParams(max_tokens=128)
|
86 |
+
|
87 |
+
# Create an LLM.
|
88 |
+
llm = LLM(model="Sreenington/Llama-3-8B-ChatQA-AWQ", quantization="AWQ")
|
89 |
+
# Generate texts from the prompts. The output is a list of RequestOutput objects
|
90 |
+
# that contain the prompt, generated text, and other information.
|
91 |
+
outputs = llm.generate(prompts, sampling_params)
|
92 |
+
|
93 |
+
# Print the outputs.
|
94 |
+
for output in outputs:
|
95 |
+
prompt = output.prompt
|
96 |
+
generated_text = output.outputs[0].text
|
97 |
+
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
|
98 |
+
```
|
99 |
+
|
100 |
### take the whole document as context
|
101 |
+
This can be applied to the scenario where the whole document can be fitted into the model, so that there is no need to run retrieval over the document
|
102 |
+
|
103 |
```python
|
104 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
105 |
import torch
|