FinchResearch
/

llama2-stable-7b-lora

Question Answering

Model card Files Files and versions Community

llama2-stable-7b-lora / README.md

Marcus Cedric R. Idia

Update README.md

01ffd01 over 1 year ago

|

history blame contribute delete

2.24 kB

	---
	library_name: peft
	license: mit
	datasets:
	- timdettmers/openassistant-guanaco
	- tatsu-lab/alpaca
	- BI55/MedText
	language:
	- en
	pipeline_tag: question-answering
	---
	Here is a README.md explaining how to run the Archimedes model locally:

	# Archimedes Model

	This README provides instructions for running the Archimedes conversational AI assistant locally.

	## Requirements

	- Python 3.6+
	- [Transformers](https://huggingface.co/docs/transformers/installation)
	- [Peft](https://github.com/hazyresearch/peft)
	- PyTorch
	- Access to the LLAMA 2 model files or a cloned public model

	Install requirements:

	```
	!pip install transformers
	!pip install peft
	!pip install torch
	!pip install datasets
	!pip install bitsandbytes
	```

	## Usage

	```python
	import transformers
	from peft import LoraConfig, get_peft_model
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

	login() # Need access to the gated model.

	# Load LLAMA 2 model
	model_name = "meta-llama/Llama-2-7b-chat-hf"

	# Quantization configuration
	bnb_config = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_quant_type="nf4",
	bnb_4bit_compute_dtype=torch.float16,
	)

	# Load model
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	quantization_config=bnb_config,
	trust_remote_code=True
	)

	# Load LoRA configuration
	lora_config = LoraConfig.from_pretrained('harpyerr/archimedes-300s-7b-chat')
	model = get_peft_model(model, lora_config)

	# Load tokenizer
	tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
	tokenizer.pad_token = tokenizer.eos_token

	# Define prompt
	text = "Can you tell me who made Space-X?"
	prompt = "You are a helpful assistant. Please provide an informative response. \n\n" + text

	# Generate response
	device = "cuda:0"
	inputs = tokenizer(prompt, return_tensors="pt").to(device)
	outputs = model.generate(**inputs, max_new_tokens=100)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	This loads the LLAMA 2 model, applies 4-bit quantization and LoRA optimizations, constructs a prompt, and generates a response.

	See the [docs](https://huggingface.co/docs/transformers/model_doc/auto#transformers.AutoModelForCausalLM) for more details.