File size: 2,270 Bytes
2d5eb5d feb2b06 886fdd8 6d9c3b6 ad0070c 6d9c3b6 886fdd8 29fe3b1 886fdd8 6d9c3b6 feb2b06 10ea185 feb2b06 e374365 feb2b06 10ea185 feb2b06 10ea185 feb2b06 10ea185 feb2b06 10ea185 6d9c3b6 feb2b06 f56ac71 d820d12 f56ac71 feb2b06 ccfa1c2 f56ac71 feb2b06 f56ac71 6d9c3b6 feb2b06 f56ac71 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
---
license: apache-2.0
language:
- en
library_name: transformers
tags:
- 'quantization '
- LLM
- Dolly
---
**Requirements:**
You can run this model on Google Colab Pro, it requires a substantial amount of VRAM.
<pre>
!pip install -q -U bitsandbytes
!pip install -q -U git+https://github.com/huggingface/transformers.git
!pip install -q -U git+https://github.com/huggingface/peft.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git
</pre>
**Import this model using:**
<pre>
<code>
<span style="color: #0000FF;">import</span> torch
<span style="color: #0000FF;">from</span> peft <span style="color: #0000FF;">import</span> PeftModel, PeftConfig
<span style="color: #0000FF;">from</span> transformers <span style="color: #0000FF;">import</span> AutoModelForCausalLM, AutoTokenizer
peft_model_id = <span style="color: #A31515;">"AhmedBou/databricks-dolly-v2-3b_on_NCSS"</span>
config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, return_dict=<span style="color: #0000FF;">True</span>, load_in_8bit=<span style="color: #0000FF;">True</span>, device_map=<span style="color: #0000FF;">'auto'</span>)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
<span style="color: #808080;"># Load the Lora model</span>
model = PeftModel.from_pretrained(model, peft_model_id)
</code>
</pre>
**Inference using:**
<pre>
<code>
<span style="color: #0000FF;">batch</span> = tokenizer("Multiple Regression for Appraisal -->: ", return_tensors=<span style="color: #A31515;">'pt'</span>)
<span style="color: #0000FF;">with</span> torch.cuda.amp.autocast():
output_tokens = model.generate(**batch, max_new_tokens=<span style="color: #098658;">50</span>)
<span style="color: #0000FF;">print</span>(<span style="color: #A31515;">"\n\n"</span>, tokenizer.decode(output_tokens[<span style="color: #098658;">0</span>], skip_special_tokens=<span style="color: #0000FF;">True</span>))
</code>
</pre>
**Output:**
<pre>
<code>
“Multiple Regression for Appraisal” -->: Multiple Regression for Appraisal (MRA) -->: Multiple Regression for Appraisal (MRA) (with Covariates) -->: Multiple Regression for Appraisal (MRA) (with Covariates)
</code>
</pre>
|