Text Generation
Transformers
PyTorch
Safetensors
English
llama
text-generation-inference
Inference Endpoints
chansurgeplus commited on
Commit
2f53cad
·
verified ·
1 Parent(s): fd847fc

Added usage

Browse files
Files changed (1) hide show
  1. README.md +33 -0
README.md CHANGED
@@ -45,6 +45,39 @@ Notice that **no** end-of-sentence (eos) token is being appended.
45
 
46
  *Note: The system prompt shown in the following figure is the one that the model has been trained on most of the time. However, you may attempt to use any other system prompt that is available in the [Orca](https://arxiv.org/abs/2306.02707) scheme.*
47
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
48
  ## Limitations
49
 
50
  - The model might not consistently show improved abilities to follow instructions, and it could respond inappropriately or get stuck in loops.
 
45
 
46
  *Note: The system prompt shown in the following figure is the one that the model has been trained on most of the time. However, you may attempt to use any other system prompt that is available in the [Orca](https://arxiv.org/abs/2306.02707) scheme.*
47
 
48
+ ## Usage
49
+
50
+ ```python
51
+ from peft import PeftConfig, PeftModel
52
+ from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig, AutoModelForSeq2SeqLM
53
+
54
+ checkpoint = "SurgeGlobal/OpenBezoar-SFT"
55
+
56
+ tokenizer = AutoTokenizer.from_pretrained(checkpoint)
57
+
58
+ model = AutoModelForCausalLM.from_pretrained(
59
+ checkpoint,
60
+ load_in_4bit=True, # optionally for low resource environments
61
+ device_map="auto"
62
+ )
63
+
64
+ prompt = """### System:
65
+ Below is an instruction that describes a task, optionally paired with an input that provides further context following that instruction. Write a response that appropriately completes the request.
66
+
67
+ ### Instruction:
68
+ {instruction}
69
+
70
+ ### Response:""".format(
71
+ instruction="What is the world state in the year 1597."
72
+ )
73
+
74
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
75
+
76
+ outputs = model.generate(**inputs, max_new_tokens=1024, do_sample=True)
77
+
78
+ print(tokenizer.decode(outputs[0]))
79
+ ```
80
+
81
  ## Limitations
82
 
83
  - The model might not consistently show improved abilities to follow instructions, and it could respond inappropriately or get stuck in loops.