ebowwa
/

human-biases-io-0.4

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

ebowwa commited on May 28

Commit

bad1578

•

1 Parent(s): 7406a14

Update README.md

Files changed (1) hide show

README.md +3 -2

README.md CHANGED Viewed

@@ -19,8 +19,10 @@ base_model: unsloth/mistral-7b-v0.3-bnb-4bit
 This mistral model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 ```
-# alpaca_prompt = Copied from above
 FastLanguageModel.for_inference(model) # Enable native 2x faster inference
 inputs = tokenizer(
 [
@@ -36,7 +38,6 @@ text_streamer = TextStreamer(tokenizer)
 _ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 1280)
 ```
 ```
 Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
 <s>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

 This mistral model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
+## Inference Example
 ```
+# alpaca_prompt = Copy from alpaca
 FastLanguageModel.for_inference(model) # Enable native 2x faster inference
 inputs = tokenizer(
 [
 _ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 1280)
 ```
 ```
 Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
 <s>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.