|
# Vietnamese Llama-30b with LoRA Adapters |
|
|
|
|
|
This repository contains a Vietnamese Llama-30b model fine-tuned with QLoRA (Quantization Low-Rank Adapter) adapters. The adapter is a plug-and-play tool that enables the LLaMa model to perform well in many Vietnamese NLP tasks. |
|
|
|
Project Github page: [Github](https://github.com/VietnamAIHub/Vietnamese_LLMs) |
|
|
|
## Model Overview |
|
|
|
The Vietnamese Llama-30B model is a large language model capable of generating meaningful text and can be used in a wide variety of natural language processing tasks, including text generation, sentiment analysis, and more. By using LoRA adapters, the model achieves better performance on low-resource tasks and demonstrates improved generalization. |
|
|
|
## Dataset and Fine-Tuning |
|
|
|
The LLaMa model was fine-tuned on over 200K instructions from various sources to improve its ability to understand and generate text for different tasks. The instruction dataset comprises data from the following sources: |
|
Dataset is Coming Soon |
|
|
|
## Loading the Model |
|
|
|
To load the fine-tuned Vietnamese Llama-30b model with LoRA adapters, follow the code snippet below: |
|
|
|
```python |
|
import torch |
|
from transformers import AutoModelForCausalLM, LlamaTokenizer |
|
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
|
|
|
model_name = "VietnamAIHub/Vietnamese_SFT_llama_30B_v1" |
|
cache_dir="/save_weight_path" |
|
## Loading Base LLaMa model weight and Merge with Adapter Weight wiht the base model |
|
m = AutoModelForCausalLM.from_pretrained( |
|
model_name, |
|
torch_dtype=torch.bfloat16, |
|
device_map={"cuda": 0}, |
|
cache_dir=cache_dir |
|
) |
|
|
|
## Save model to specific path |
|
tok = LlamaTokenizer.from_pretrained(model_name, cache_dir=cache_dir) |
|
|
|
## Loading Unified Model Again after Merging the Weight |
|
tok.bos_token_id = 1 |
|
|
|
generation_config = dict( |
|
temperature=0.2, |
|
top_k=20, |
|
top_p=0.9, |
|
do_sample=True, |
|
num_beams=1, |
|
repetition_penalty=1.2, |
|
max_new_tokens=400, |
|
early_stopping=True, |
|
|
|
) |
|
|
|
prompt="Cách để học tập về một môn học thật tốt" |
|
_DEFAULT_TEMPLATE=f"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### prompt:\n{prompt}\n\n### response:\n" |
|
|
|
inputs = tok(message,return_tensors="pt") #add_special_tokens=False ? |
|
generation_output = m.generate( |
|
input_ids = inputs["input_ids"].to(device), |
|
attention_mask = inputs['attention_mask'].to(device), |
|
eos_token_id=tok.eos_token_id, |
|
pad_token_id=tok.pad_token_id, |
|
**generation_config |
|
) |
|
generation_output_ = m.generate(input_ids = inputs["input_ids"].to(device), **generation_config) |
|
s = generation_output[0] |
|
output = tok.decode(s,skip_special_tokens=True) |
|
response = output.split("### response:")[1].strip() |
|
print(respone) |
|
``` |
|
|
|
## Conclusion |
|
The Llama-30b with LoRA adapters is a versatile language model that can be utilized for a wide range of NLP tasks in Vietnamese. We hope that researchers and developers find this model useful and are encouraged to experiment with it in their projects. |
|
|
|
For any questions, feedback, or contributions, please feel free to contact the maintainers of this repository TranNhiem 🙌: [Linkedin](https://www.linkedin.com/in/tran-nhiem-ab1851125/) [Twitter](https://twitter.com/TranRick2) [Facebook](https://www.facebook.com/jean.tran.336). Happy fine-tuning and experimenting with the Llama-30b model! |
|
|