File size: 2,270 Bytes

2d5eb5d
 
 
 
 
 
 
 
 
feb2b06
886fdd8
6d9c3b6
ad0070c
6d9c3b6
886fdd8
 
29fe3b1
886fdd8
 
 
 
 
6d9c3b6
feb2b06
 
10ea185
 
 
 
feb2b06
e374365
feb2b06
10ea185
feb2b06
 
10ea185
feb2b06
10ea185
feb2b06
 
10ea185
6d9c3b6
feb2b06
 
f56ac71
d820d12
f56ac71
 
feb2b06
ccfa1c2
f56ac71
feb2b06
 
f56ac71
6d9c3b6
feb2b06
 
f56ac71

---
license: apache-2.0
language:
- en
library_name: transformers
tags:
- 'quantization '
- LLM
- Dolly
---

**Requirements:**

You can run this model on Google Colab Pro, it requires a substantial amount of VRAM.

<pre>
!pip install -q -U bitsandbytes
!pip install -q -U git+https://github.com/huggingface/transformers.git 
!pip install -q -U git+https://github.com/huggingface/peft.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git
</pre>

**Import this model using:**

<pre>
<code>
<span style="color: #0000FF;">import</span> torch
<span style="color: #0000FF;">from</span> peft <span style="color: #0000FF;">import</span> PeftModel, PeftConfig
<span style="color: #0000FF;">from</span> transformers <span style="color: #0000FF;">import</span> AutoModelForCausalLM, AutoTokenizer

peft_model_id = <span style="color: #A31515;">"AhmedBou/databricks-dolly-v2-3b_on_NCSS"</span>
config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, return_dict=<span style="color: #0000FF;">True</span>, load_in_8bit=<span style="color: #0000FF;">True</span>, device_map=<span style="color: #0000FF;">'auto'</span>)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

<span style="color: #808080;"># Load the Lora model</span>
model = PeftModel.from_pretrained(model, peft_model_id)
</code>
</pre>


**Inference using:**

<pre>
<code>
<span style="color: #0000FF;">batch</span> = tokenizer("Multiple Regression for Appraisal --&gt;: ", return_tensors=<span style="color: #A31515;">'pt'</span>)
<span style="color: #0000FF;">with</span> torch.cuda.amp.autocast():
    output_tokens = model.generate(**batch, max_new_tokens=<span style="color: #098658;">50</span>)

<span style="color: #0000FF;">print</span>(<span style="color: #A31515;">"\n\n"</span>, tokenizer.decode(output_tokens[<span style="color: #098658;">0</span>], skip_special_tokens=<span style="color: #0000FF;">True</span>))
</code>
</pre>


**Output:**

<pre>
<code>
“Multiple Regression for Appraisal” --&gt;: Multiple Regression for Appraisal (MRA) --&gt;: Multiple Regression for Appraisal (MRA) (with Covariates) --&gt;: Multiple Regression for Appraisal (MRA) (with Covariates)
</code>
</pre>