File size: 7,333 Bytes
c8de5d6
71db217
 
 
 
 
 
 
c8de5d6
 
 
b3e87a3
c8de5d6
71db217
c8de5d6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e6e267c
c8de5d6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c7127d4
c8de5d6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0a299a4
 
 
 
 
 
 
c8de5d6
 
 
 
71db217
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
---
library_name: transformers
tags:
- finance
- economic
license: cc-by-nc-4.0
datasets:
- mncai/orca_dpo_pairs_ko
language:
- ko
- en
Basemodel: SGEcon/KoSOLAR-10.7B-v0.2_fin_v4
---

## Model Details
Model Developers: Sogang University SGEconFinlab(<<https://sc.sogang.ac.kr/aifinlab/>)


## Model Description

This model is a language model specialized in economics and finance. This was learned with various economic/finance-related data.
The data sources are listed below, and we are not releasing the data that we trained on because it was used for research/policy purposes. 
If you wish to use the original data, please contact the original author directly for permission to use it.

- **Developed by:** Sogang University SGEconFinlab(<https://sc.sogang.ac.kr/aifinlab/>)
- **License:** cc-by-nc-4.0
- **Base Model:** SGEcon/KoSOLAR-10.7B-v0.2_fin_v4(<https://huggingface.co/SGEcon/KoSOLAR-10.7B-v0.2_fin_v4>)


## Loading the Model

    peft_model_id = "SGEcon/EconFinKoSOLAR-10.7B"
    config = PeftConfig.from_pretrained(peft_model_id)
    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.bfloat16
    )
    model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, quantization_config=bnb_config, device_map={"":0})
    model = PeftModel.from_pretrained(model, peft_model_id)
    tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
    model.eval()


## Conducting Conversation

    import re

    def gen(x):
        inputs = tokenizer(f"### 질문: {x}\n\n### λ‹΅λ³€:", return_tensors='pt', return_token_type_ids=False)
    
        # Move data to GPU (if available)
        inputs = {k: v.to(device="cuda" if torch.cuda.is_available() else "cpu") for k, v in inputs.items()}

        gened = model.generate(
            **inputs,
            max_new_tokens=256,  # Maximum number of new tokens to create
            early_stopping=True,
            num_return_sequences=1,  # Generate only one answer
            do_sample=True,  # Enable sampling to generate a variety of answers
            eos_token_id=tokenizer.eos_token_id,  # Using EOS Token IDs 
            temperature=0.9,  # This option is adjustable.
            top_p=0.9,  # This option is adjustable.
            top_k=100  # This option is adjustable.
        )
    
        # Decode the generated sequence and convert it to output text 
        decoded = tokenizer.decode(gened[0], skip_special_tokens=True).strip()

        # Extract only text after a string "### λ‹΅λ³€:" 
        answer_start_idx = decoded.find("### λ‹΅λ³€:") + len("### λ‹΅λ³€:")
        complete_answer = decoded[answer_start_idx:].strip()

        # Find the first punctuation mark (. ? !) and extract only up to it
        match = re.search(r"[\.\?\!][^\.\?\!]*$", complete_answer)
        if match:
            complete_answer = complete_answer[:match.end()].strip()
    
        return complete_answer




    
## Training Details

Training our model with PEFT, LoRA, DPO and Merge.

- Low-Rank Adaptation (LoRA) fixes the weights of the pretrained model and attaches learnable rank decomposition matrices to each layer of the transformer, updating only these when finetuning. In other words, LoRA is a methodology that uses low-dimensional intrinsic rank (the number of dimensions that best describe the data for a given layer or parameter) for finetuning.

- PEFT is a technique that does not tune all parameters of a model during fine-tuning, but only a small subset of parameters. By tuning only a few parameters while leaving others fixed, the model is less likely to suffer from catastrophic forgetting, where the model forgets previously learned tasks when it learns new ones. By tuning only a few parameters, models can be trained for different tasks such as QA, Summarize, and Generate PEFT. 

- Direct Preference Optimization (DPO) is an alternative to Reinforcement Learning from Human Feedback (RLHF). RLHF creates a reward function with human-selected data from multiple LLMs' answers to the same question, and then performs reinforcement learning on the reward function to improve model performance. DPO also uses preference data but trains directly without a reward function.
We selected relatively important data from the data learned by the base model, asked the base model, and extracted four answers. All 4 answers were rejected, and the original answer was selected to create the dpo dataset. Then, combine our dataset and mncai/orca_dpo_pairs_ko dataset which published on huggingface.

- Merge is a way to mix two or more models into a single model. Because merge is not training, it has the advantage of being very fast, requiring only CPU computation.


 
## Training Data

1. our dpo dataset
- It is not to be used for commercial purposes. Therefore, it is licensed under the license CC-BY-NC-4.0.
2. mncai/orca_dpo_pairs_ko(<https://huggingface.co/datasets/mncai/orca_dpo_pairs_ko>)




## Training Hyperparameters

|Hyperparameter|SGEcon/KoSOLAR-10.7B-v0.2_fin_v4_dpo|
|------|---|
|Lora Method|Lora|
|load in 4 bit|True|
|learning rate|1e-5|
|lr scheduler|cosine|
|lora alpa|8|
|lora rank|32|
|lora dropout|0.05|
|optim|adamw_torch|
|target_modules|q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, lm_head|


   
## License
The language identification model is distributed under the Creative Commons Attribution-NonCommercial 4.0 International Public License.



## Evaluation
We are creating our own economics questions and evaluating them. 
There are four tasks in total: Korean-English sentence translation with economic terms, English-Korean sentence translation with economic terms, economic terms, and multiple-choice questions on economics.
We'll keep you updated on our progress and results.



## Example

> μ€‘μ•™μ€ν–‰μ˜ 역할에 λŒ€ν•΄μ„œ μ„€λͺ…ν•΄μ€„λž˜?

>> μš°μ„  쀑앙은행이 λ­ν•˜λŠ” 곳인지 μ„€λͺ…ν• κ²Œ. 쀑앙은행은 정뢀와 경제주체 μ‚¬μ΄μ˜ μ€‘μž¬μž 같은 역할을 ν•˜λŠ” 곳이야. 쀑앙은행은 톡화정책, κΈˆμœ΅μ‹œμŠ€ν…œμ˜ μ•ˆμ „μ„ λ‹΄λ‹Ήν•˜κ³  있고, 법정톡화λ₯Ό λ°œν–‰ν•˜λŠ” 곳이야.
>> 쀑앙은행이 톡화정책을 톡해 κ²½μ œμ„±μž₯을 μ§€μ›ν•˜κ³ , λ¬Όκ°€λ₯Ό μ‘°μ ˆν•΄ 경제λ₯Ό κ΄€λ¦¬ν•˜λŠ”κ±°μ•Ό. 쀑앙은행이 λˆμ„ λ„ˆλ¬΄ 많이 ν’€λ©΄ λ¬Όκ°€κ°€ 였λ₯΄κ³ , λˆμ„ λ„ˆλ¬΄ 많이 걷으면 κ²½μ œκ°€ μΉ¨μ²΄λ˜λŠ”λ°, 쀑앙은행은 κ²½μ œκ°€ μ•ˆμ •λ  수 있게 끔 ν†΅ν™”λŸ‰μ„ μ‘°μ ˆν•˜λŠ”κ±°μ•Ό.
>> 쀑앙은행은 κΈˆμœ΅μ‹œμŠ€ν…œμ˜ μ•ˆμ „μ„ λ‹΄λ‹Ήν•˜κ³  μžˆμ–΄. λ§Œμ•½ μ–΄λ–€ 은행이 λŒ€μΆœμ„ λ„ˆλ¬΄ 많이 ν•΄μ„œ μœ„κΈ°μ— μ²˜ν•œλ‹€λ©΄, 쀑앙은행은 은행에 λˆμ„ 쀘 은행을 μ§€μ›ν•΄μ„œ κΈˆμœ΅μ‹œμŠ€ν…œμ˜ μ•ˆμ „μ„ 지킬 수 있게 ν•΄.
>> λ§ˆμ§€λ§‰μœΌλ‘œ 쀑앙은행은 법정톡화λ₯Ό λ°œν–‰ν•˜λŠ” 곳이야. λ²•μ •ν†΅ν™”λŠ” μ •λΆ€κ°€ μ§€μ •ν•œ ν™”νλ‘œ, μ •λΆ€κ°€ λ³΄μ¦ν•˜λŠ” ν†΅ν™”λΌλŠ”κ±°μ•Ό. λ²•μ •ν†΅ν™”λŠ” μš°λ¦¬κ°€ 많이 μ‚¬μš©ν•˜λŠ” 화폐인데, 쀑앙은행은 법정톡화λ₯Ό λ°œν–‰ν•˜κ³  μœ ν†΅μ‹œμΌœ κ²½μ œκ°€ μ›ν™œν•˜κ²Œ λŒμ•„κ°ˆ 수 있게 ν•΄.
>> μ΄λ ‡κ²Œ 쀑앙은행은 경제λ₯Ό κ΄€λ¦¬ν•˜λŠ” 역할을 ν•˜κ³ , 톡화정책, κΈˆμœ΅μ‹œμŠ€ν…œμ˜ μ•ˆμ „μ„ λ‹΄λ‹Ήν•˜κ³ , 법정톡화λ₯Ό λ°œν–‰ν•΄μ„œ κ²½μ œκ°€ μ›ν™œν•˜κ²Œ λŒμ•„κ°ˆ 수 있게 ν•΄.