File size: 4,117 Bytes
a5519c6
db25599
 
a5519c6
 
 
db25599
 
 
a5519c6
db25599
 
 
 
a5519c6
db25599
 
 
a5519c6
db25599
 
 
7118e65
db25599
 
7118e65
db25599
7118e65
 
db25599
 
7118e65
 
db25599
7118e65
db25599
 
7118e65
 
 
 
 
 
 
 
db25599
 
7118e65
 
 
 
 
 
 
 
 
 
 
db25599
 
 
 
 
 
 
 
7118e65
db25599
7118e65
 
db25599
 
 
7118e65
 
 
 
 
 
 
 
 
 
 
 
db25599
 
 
 
 
 
 
7118e65
db25599
 
 
 
 
7118e65
db25599
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
# Vietnamese Llama-30b with LoRA Adapters


This repository contains a Vietnamese Llama-30b model fine-tuned with QLoRA (Quantization Low-Rank Adapter) adapters. The adapter is a plug-and-play tool that enables the LLaMa model to perform well in many Vietnamese NLP tasks.

Project Github page: [Github](https://github.com/VietnamAIHub/Vietnamese_LLMs)

## Model Overview

The Vietnamese Llama-30B model is a large language model capable of generating meaningful text and can be used in a wide variety of natural language processing tasks, including text generation, sentiment analysis, and more. By using LoRA adapters, the model achieves better performance on low-resource tasks and demonstrates improved generalization.

## Dataset and Fine-Tuning

The LLaMa model was fine-tuned on over 200K instructions from various sources to improve its ability to understand and generate text for different tasks. The instruction dataset comprises data from the following sources:
Dataset is Coming Soon

## Loading the Model

To load the fine-tuned Vietnamese Llama-30b model with LoRA adapters, follow the code snippet below:

```python
import torch  
from transformers import AutoModelForCausalLM, AutoTokenizer, StoppingCriteria
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model_name = "VietnamAIHub/Vietnamese_llama_30B_SFT"
cache_dir="/save_weight_path"

## Loading LLaMa model weight 
m = AutoModelForCausalLM.from_pretrained(
    model_name,
    load_in_8bit=True,
    trust_remote_code=True, 
    cache_dir=cache_dir

)

## Loading Tokenizer
tok = AutoTokenizer.from_pretrained(
    model_name,
    padding_side="right",
    use_fast=False, # Fast tokenizer giving issues.
    tokenizer_type='llama', #if 'llama' in args.model_name_or_path else None, # Needed for HF name change
    use_auth_token=True,
    cache_dir=cache_dir)

tok.bos_token_id = 1
stop_token_ids = [0]

## Setting Stopping Criteria
class StopOnTokens(StoppingCriteria):
    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
        for stop_id in stop_token_ids:
            if input_ids[0][-1] == stop_id:
                return True
        return False
stop = StopOnTokens()
streamer = TextIteratorStreamer(tok, timeout=10.0, skip_prompt=True, skip_special_tokens=True)

generation_config = dict(
        temperature=0.2,
        top_k=20,
        top_p=0.9,
        do_sample=True,
        num_beams=1,
        repetition_penalty=1.2,
        max_new_tokens=1024, 
        early_stopping=True,
       stopping_criteria=StoppingCriteriaList([stop]),
      streamer=streamer,
    )




## Set your Input with System Prompt

input_prompt="Cách để học tập về một môn học thật tốt"
system_prompt=f"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### prompt:\n{input_prompt}\n\n### response:\n"


inputs = tok(system_prompt,return_tensors="pt")  #add_special_tokens=False ?
input_ids = input_ids.to(device)


generation_output = m.generate(
    input_ids = inputs["input_ids"].to(device),
    attention_mask = inputs['attention_mask'].to(device),
    eos_token_id=tok.eos_token_id,
    pad_token_id=tok.pad_token_id,
    **generation_config
)

generation_output_ = m.generate(input_ids = inputs["input_ids"].to(device), **generation_config)
s = generation_output[0]
output = tok.decode(s,skip_special_tokens=True)
response = output.split("### response:")[1].strip()
print(respone)

```

## Conclusion
The Llama-30b with LoRA adapters is a versatile language model that can be utilized for a wide range of NLP tasks in Vietnamese. We hope that researchers and developers find this model useful and are encouraged to experiment with it in their projects.

For any questions, feedback, or contributions, please feel free to contact the maintainers of this repository TranNhiem 🙌: [Linkedin](https://www.linkedin.com/in/tran-nhiem-ab1851125/) [Twitter](https://twitter.com/TranRick2) [Facebook](https://www.facebook.com/jean.tran.336). Happy fine-tuning and experimenting with the Llama-30b model!