Llama3-8B-Galician-Chat-Lora / README.md

Update README.md

3eeb02d verified 7 months ago

5.22 kB

	---
	license: apache-2.0
	datasets:
	- irlab-udc/alpaca_data_galician
	language:
	- gl
	- en
	---

	# Llama3-8B Lora adapter for Galician language

	This repository houses a specialized LoRA (Low-Rank Adaptation) Adapter designed specifically for fine-tuning Meta's LLaMA 3-8B Instruct version for applications involving the Galician language. The purpose of this adapter is to efficiently adapt the pre-trained model, which has been initially trained on a broad range of data and languages, to better understand and generate text in Galician.

	## Adapter Description

	This Lora Adapter has been specifically fine-tuned to understand and generate text in Galician. It was refined using a modified version of the [irlab-udc/alpaca_data_galician](https://huggingface.co/datasets/irlab-udc/alpaca_data_galician) dataset, enriched with synthetic data to enhance its text generation and comprehension capabilities in specific contexts.

	### Technical Details

	- Base Model: Unsloth Meta's LLaMA 3 8B Instruct (https://huggingface.co/unsloth/llama-3-8b-Instruct-bnb-4bit)
	- Fine-Tuning Platform: LLaMA Factory
	- Infrastructure: Finisterrae III Supercomputer, CESGA (Galicia-Spain)
	- Dataset: [irlab-udc/alpaca_data_galician](https://huggingface.co/datasets/irlab-udc/alpaca_data_galician) (with modifications)
	- Fine-Tuning Objective: To improve text comprehension and generation in Galician.

	### Trainning parameters

	The project is still in the testing phase, and the training parameters will continue to vary to find the values that result in a more accurate model. Currently, the model is trained with a set of 5000 random entries from the dataset and the following values:

	- num_train_epochs=3.0
	- finetuning_type="lora"
	- per_device_train_batch_size=2
	- gradient_accumulation_steps=4
	- lr_scheduler_type="cosine"
	- learning_rate=5e-5
	- max_grad_norm=1.0

	## Testing The Adapter

	Below, we have several examples of question-answer pairs comparing the pure Llama3 model with the Llama3 model fine tunned for the Galician language.

	Llama3-8B 👎
	```bash
	User: Cal é a capital de Canadá?
	Assistant: Não! Cal não é a capital de Canadá. A capital de Canadá é Ottawa.
	```
	Llama3-8B-Galician-Chat-Lora ✅ 👍
	```bash
	User: Cal é a capital de Canadá?
	Assistant: A capital de Canadá é Ottawa.
	```
	Llama3-8B 👎
	```bash
	User: Cantos habitantes ten Galicia?
	Assistant: Según el Instituto Nacional de Estadística (INE) de España, la población residente en Galicia al 1 de enero de 2021 era de aproximadamente 2.79 millones de personas.
	```
	Llama3-8B-Galician-Chat-Lora ✅ 👍
	```bash
	User: Cantos habitantes ten Galicia?
	Assistant: Segundo as últimas estimacións, Galicia ten uns 2,8 millóns de habitantes.
	```

	## How to Use the Adapter

	To use this adapter, follow the example code provided below. Ensure you have the necessary libraries installed (e.g., Hugging Face's `transformers`).

	### Installation

	Download de adapter from huggingface:
	```bash
	git clone https://huggingface.co/abrahammg/Llama3-8B-Galician-Chat-Lora
	```
	Install dependencies:
	```bash
	pip install transformers bitsandbytes "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" llmtuner xformers
	```

	### Run the adapter

	Create a python script (ex. run_model.py):

	```bash
	from llmtuner import ChatModel
	from llmtuner.extras.misc import torch_gc

	chat_model = ChatModel(dict(
	model_name_or_path="unsloth/llama-3-8b-Instruct-bnb-4bit", # use bnb-4bit-quantized Llama-3-8B-Instruct model
	adapter_name_or_path="./", # load Llama3-8B-Galician-Chat-Lora adapter
	finetuning_type="lora",
	template="llama3",
	quantization_bit=4, # load 4-bit quantized model
	use_unsloth=True, # use UnslothAI's LoRA optimization for 2x faster generation
	))

	messages = []
	while True:
	query = input("\nUser: ")
	if query.strip() == "exit":
	break

	if query.strip() == "clear":
	messages = []
	torch_gc()
	print("History has been removed.")
	continue

	messages.append({"role": "user", "content": query})
	print("Assistant: ", end="", flush=True)
	response = ""
	for new_text in chat_model.stream_chat(messages):
	print(new_text, end="", flush=True)
	response += new_text
	print()
	messages.append({"role": "assistant", "content": response})

	torch_gc()
	```
	and run it
	```bash
	python run_model.py
	```

	# Full Merged Model 💬

	You can find a the adapter merged with the Llama3-8B base model in this repo: [https://huggingface.co/abrahammg/Llama3-8B-Galician-Instruct-GGUF](https://huggingface.co/abrahammg/Llama3-8B-Galician-Instruct-GGUF)

	To utilize this model within LM Studio, simply input the URL https://huggingface.co/abrahammg/Llama3-8B-Galician-Instruct-GGUF into the search box. For the best performance, ensure you set the template to LLama3.
	Or pull it in Ollama with the command:
	```bash
	ollama run abrahammg/llama3-gl-chat
	```

	## Acknowledgement

	- [meta-llama/llama3](https://github.com/meta-llama/llama3)
	- [hiyouga/LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory)
	- [irlab-udc/alpaca_data_galician](https://huggingface.co/datasets/irlab-udc/alpaca_data_galician)