Transformers
4 languages
falcon
sft
text-generation-inference
TheBloke commited on
Commit
8eff3d4
1 Parent(s): 96afdcc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -2
README.md CHANGED
@@ -41,7 +41,13 @@ Currently these files will also not work with code that previously supported Fal
41
 
42
  * [2, 3, 4, 5, 6, 8-bit GGCT models for CPU+GPU inference](https://huggingface.co/TheBloke/falcon-40b-sft-mix-1226-GGML)
43
  * [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/OpenAssistant/falcon-40b-sft-mix-1226)
44
-
 
 
 
 
 
 
45
  <!-- compatibility_ggml start -->
46
  ## Compatibility
47
 
@@ -57,7 +63,7 @@ Compiling on Windows: developer cmp-nct notes: 'I personally compile it using VS
57
 
58
  Once compiled you can then use `bin/falcon_main` just like you would use llama.cpp. For example:
59
  ```
60
- bin/falcon_main -t 8 -ngl 100 -b 1 -m falcon-40b-sft-mix-1226.ggccv1.q4_K.bin -p "What is a falcon?\n### Response:"
61
  ```
62
 
63
  You can specify `-ngl 100` regardles of your VRAM, as it will automatically detect how much VRAM is available to be used.
 
41
 
42
  * [2, 3, 4, 5, 6, 8-bit GGCT models for CPU+GPU inference](https://huggingface.co/TheBloke/falcon-40b-sft-mix-1226-GGML)
43
  * [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/OpenAssistant/falcon-40b-sft-mix-1226)
44
+
45
+ ## Prompt template
46
+
47
+ ```
48
+ <|prompter|>prompt<|endoftext|><|assistant|>
49
+ ```
50
+
51
  <!-- compatibility_ggml start -->
52
  ## Compatibility
53
 
 
63
 
64
  Once compiled you can then use `bin/falcon_main` just like you would use llama.cpp. For example:
65
  ```
66
+ bin/falcon_main -t 8 -ngl 100 -b 1 -m falcon-40b-sft-mix-1226.ggccv1.q4_K.bin -p "<|prompter|>write a story about llamas<|endoftext|><|assistant|>"
67
  ```
68
 
69
  You can specify `-ngl 100` regardles of your VRAM, as it will automatically detect how much VRAM is available to be used.