|
# Vietnamese Llama-30b with LoRA Adapters |
|
|
|
|
|
This repository contains a Vietnamese Llama-30b model fine-tuned with QLoRA (Quantization Low-Rank Adapter) adapters. The adapter is a plug-and-play tool that enables the LLaMa model to perform well in many Vietnamese NLP tasks. |
|
|
|
Project Github page: [Github](https://github.com/VietnamAIHub/Vietnamese_LLMs) |
|
|
|
## Model Overview |
|
|
|
The Vietnamese Llama-30B model is a large language model capable of generating meaningful text and can be used in a wide variety of natural language processing tasks, including text generation, sentiment analysis, and more. By using LoRA adapters, the model achieves better performance on low-resource tasks and demonstrates improved generalization. |
|
|
|
## Dataset and Fine-Tuning |
|
|
|
The LLaMa model was fine-tuned on over 200K instructions from various sources to improve its ability to understand and generate text for different tasks. The instruction dataset comprises data from the following sources: |
|
Dataset is Coming Soon |
|
|
|
## Loading the Model |
|
|
|
To load the fine-tuned Vietnamese Llama-30b model with LoRA adapters, follow the code snippet below: |
|
|
|
```python |
|
import torch |
|
from transformers import AutoModelForCausalLM, AutoTokenizer, StoppingCriteria |
|
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
|
|
|
model_name = "VietnamAIHub/Vietnamese_llama_30B_SFT" |
|
cache_dir="/save_weight_path" |
|
|
|
## Loading LLaMa model weight |
|
m = AutoModelForCausalLM.from_pretrained( |
|
model_name, |
|
load_in_8bit=True, |
|
trust_remote_code=True, |
|
cache_dir=cache_dir |
|
|
|
) |
|
|
|
## Loading Tokenizer |
|
tok = AutoTokenizer.from_pretrained( |
|
model_name, |
|
padding_side="right", |
|
use_fast=False, # Fast tokenizer giving issues. |
|
tokenizer_type='llama', #if 'llama' in args.model_name_or_path else None, # Needed for HF name change |
|
use_auth_token=True, |
|
cache_dir=cache_dir) |
|
|
|
tok.bos_token_id = 1 |
|
stop_token_ids = [0] |
|
|
|
## Setting Stopping Criteria |
|
class StopOnTokens(StoppingCriteria): |
|
def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool: |
|
for stop_id in stop_token_ids: |
|
if input_ids[0][-1] == stop_id: |
|
return True |
|
return False |
|
stop = StopOnTokens() |
|
streamer = TextIteratorStreamer(tok, timeout=10.0, skip_prompt=True, skip_special_tokens=True) |
|
|
|
generation_config = dict( |
|
temperature=0.2, |
|
top_k=20, |
|
top_p=0.9, |
|
do_sample=True, |
|
num_beams=1, |
|
repetition_penalty=1.2, |
|
max_new_tokens=1024, |
|
early_stopping=True, |
|
stopping_criteria=StoppingCriteriaList([stop]), |
|
streamer=streamer, |
|
) |
|
|
|
|
|
|
|
|
|
## Set your Input with System Prompt |
|
|
|
input_prompt="Cách để học tập về một môn học thật tốt" |
|
system_prompt=f"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### prompt:\n{input_prompt}\n\n### response:\n" |
|
|
|
|
|
inputs = tok(system_prompt,return_tensors="pt") #add_special_tokens=False ? |
|
input_ids = input_ids.to(device) |
|
|
|
|
|
generation_output = m.generate( |
|
input_ids = inputs["input_ids"].to(device), |
|
attention_mask = inputs['attention_mask'].to(device), |
|
eos_token_id=tok.eos_token_id, |
|
pad_token_id=tok.pad_token_id, |
|
**generation_config |
|
) |
|
|
|
generation_output_ = m.generate(input_ids = inputs["input_ids"].to(device), **generation_config) |
|
s = generation_output[0] |
|
output = tok.decode(s,skip_special_tokens=True) |
|
response = output.split("### response:")[1].strip() |
|
print(respone) |
|
|
|
``` |
|
|
|
## Conclusion |
|
The Llama-30b with LoRA adapters is a versatile language model that can be utilized for a wide range of NLP tasks in Vietnamese. We hope that researchers and developers find this model useful and are encouraged to experiment with it in their projects. |
|
|
|
For any questions, feedback, or contributions, please feel free to contact the maintainers of this repository TranNhiem 🙌: [Linkedin](https://www.linkedin.com/in/tran-nhiem-ab1851125/) [Twitter](https://twitter.com/TranRick2) [Facebook](https://www.facebook.com/jean.tran.336). Happy fine-tuning and experimenting with the Llama-30b model! |
|
|