SGEcon commited on
Commit
c8de5d6
β€’
1 Parent(s): 800b562

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +126 -0
README.md ADDED
@@ -0,0 +1,126 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - ko
4
+ - en
5
+ ---
6
+ ## Model Details
7
+ Model Developers: Sogang University SGEconFinlab(<<https://sc.sogang.ac.kr/aifinlab/>)
8
+
9
+
10
+ ## Model Description
11
+
12
+ This model is a language model specialized in economics and finance. This was learned with various economic/finance-related data.
13
+ The data sources are listed below, and we are not releasing the data that we trained on because it was used for research/policy purposes.
14
+ If you wish to use the original data, please contact the original author directly for permission to use it.
15
+
16
+ - **Developed by:** Sogang University SGEconFinlab(<https://sc.sogang.ac.kr/aifinlab/>)
17
+ - **License:** cc-by-nc-4.0
18
+ - **Base Model:** SGEcon/KoSOLAR-10.7B-v0.2_fin_v4(<https://huggingface.co/SGEcon/KoSOLAR-10.7B-v0.2_fin_v4>)
19
+
20
+
21
+ ## Loading the Model
22
+
23
+ peft_model_id = "SGEcon/KoSOLAR-10.7B-v0.2_fin_v4_dpo"
24
+ config = PeftConfig.from_pretrained(peft_model_id)
25
+ bnb_config = BitsAndBytesConfig(
26
+ load_in_4bit=True,
27
+ bnb_4bit_use_double_quant=True,
28
+ bnb_4bit_quant_type="nf4",
29
+ bnb_4bit_compute_dtype=torch.bfloat16
30
+ )
31
+ model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, quantization_config=bnb_config, device_map={"":0})
32
+ model = PeftModel.from_pretrained(model, peft_model_id)
33
+ tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
34
+ model.eval()
35
+
36
+
37
+ ## Conducting Conversation
38
+
39
+ import re
40
+
41
+ def gen(x):
42
+ inputs = tokenizer(f"### 질문: {x}\n\n### λ‹΅λ³€:", return_tensors='pt', return_token_type_ids=False)
43
+
44
+ # Move data to GPU (if available)
45
+ inputs = {k: v.to(device="cuda" if torch.cuda.is_available() else "cpu") for k, v in inputs.items()}
46
+
47
+ gened = model.generate(
48
+ **inputs,
49
+ max_new_tokens=256, # Maximum number of new tokens to create
50
+ early_stopping=True,
51
+ num_return_sequences=1, # Generate only one answer
52
+ do_sample=True, # Enable sampling to generate a variety of answers
53
+ eos_token_id=tokenizer.eos_token_id, # Using EOS Token IDs
54
+ temperature=0.9, # This option is adjustable.
55
+ top_p=0.8, # This option is adjustable.
56
+ top_k=100 # This option is adjustable.
57
+ )
58
+
59
+ # Decode the generated sequence and convert it to output text
60
+ decoded = tokenizer.decode(gened[0], skip_special_tokens=True).strip()
61
+
62
+ # Extract only text after a string "### λ‹΅λ³€:"
63
+ answer_start_idx = decoded.find("### λ‹΅λ³€:") + len("### λ‹΅λ³€:")
64
+ complete_answer = decoded[answer_start_idx:].strip()
65
+
66
+ # Find the first punctuation mark (. ? !) and extract only up to it
67
+ match = re.search(r"[\.\?\!][^\.\?\!]*$", complete_answer)
68
+ if match:
69
+ complete_answer = complete_answer[:match.end()].strip()
70
+
71
+ return complete_answer
72
+
73
+
74
+
75
+
76
+
77
+ ## Training Details
78
+
79
+ Training our model with PEFT, LoRA, DPO and Merge.
80
+
81
+ - Low-Rank Adaptation (LoRA) fixes the weights of the pretrained model and attaches learnable rank decomposition matrices to each layer of the transformer, updating only these when finetuning. In other words, LoRA is a methodology that uses low-dimensional intrinsic rank (the number of dimensions that best describe the data for a given layer or parameter) for finetuning.
82
+
83
+ - PEFT is a technique that does not tune all parameters of a model during fine-tuning, but only a small subset of parameters. By tuning only a few parameters while leaving others fixed, the model is less likely to suffer from catastrophic forgetting, where the model forgets previously learned tasks when it learns new ones. By tuning only a few parameters, models can be trained for different tasks such as QA, Summarize, and Generate PEFT.
84
+
85
+ - Direct Preference Optimization (DPO) is an alternative to Reinforcement Learning from Human Feedback (RLHF). RLHF creates a reward function with human-selected data from multiple LLMs' answers to the same question, and then performs reinforcement learning on the reward function to improve model performance. DPO also uses preference data but trains directly without a reward function.
86
+ We selected relatively important data from the data learned by the base model, asked the base model, and extracted four answers. All 4 answers were rejected, and the original answer was selected to create the dpo dataset. Then, combine our dataset and mncai/orca_dpo_pairs_ko dataset which published on huggingface.
87
+
88
+ - Merge is a way to mix two or more models into a single model. Because merge is not training, it has the advantage of being very fast, requiring only CPU computation.
89
+
90
+
91
+
92
+ ## Training Data
93
+
94
+ 1. our dpo dataset
95
+ - It is not to be used for commercial purposes. Therefore, it is licensed under the license CC-BY-NC-4.0.
96
+ 2. mncai/orca_dpo_pairs_ko(<https://huggingface.co/datasets/mncai/orca_dpo_pairs_ko>)
97
+
98
+
99
+
100
+
101
+ ## Training Hyperparameters
102
+
103
+ |Hyperparameter|SGEcon/KoSOLAR-10.7B-v0.2_fin_v4_dpo|
104
+ |------|---|
105
+ |Lora Method|Lora|
106
+ |load in 4 bit|True|
107
+ |learning rate|1e-5|
108
+ |lr scheduler|cosine|
109
+ |lora alpa|8|
110
+ |lora rank|32|
111
+ |lora dropout|0.05|
112
+ |optim|adamw_torch|
113
+ |target_modules|q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, lm_head|
114
+
115
+
116
+
117
+ ## License
118
+ The language identification model is distributed under the Creative Commons Attribution-NonCommercial 4.0 International Public License.
119
+
120
+
121
+
122
+ ## Example
123
+
124
+ > μ€‘μ•™μ€ν–‰μ˜ 역할에 λŒ€ν•΄μ„œ μ„€λͺ…ν•΄μ€„λž˜?
125
+
126
+ >> 쀑앙은행은 κ΅­κ°€μ˜ 톡화 및 금육 μ‹œμŠ€ν…œμ„ κ΄€λ¦¬ν•˜λŠ” μ •λΆ€ κΈ°κ΄€μž…λ‹ˆλ‹€. μ£Όμš” κΈ°λŠ₯은 λ‹€μŒκ³Ό κ°™μŠ΅λ‹ˆλ‹€: 1. 톡화 곡급 관리: 쀑앙은행은 톡화 κ³΅κΈ‰λŸ‰μ„ μ‘°μ ˆν•˜μ—¬ λ¬Όκ°€λ₯Ό μ•ˆμ •μ‹œν‚€κ³  경제 μ„±μž₯을 μ΄‰μ§„ν•©λ‹ˆλ‹€. 이λ₯Ό μœ„ν•΄ κΈ°μ€€ 금리λ₯Ό μ„€μ •ν•˜κ³ , μœ λ™μ„±μ„ μ‘°μ ˆν•˜λ©°, μžμ‚° λ§€μž… λ˜λŠ” 맀각을 톡해 톡화 κ³΅κΈ‰λŸ‰μ„ μ‘°μ ˆν•©λ‹ˆλ‹€. 2. 경제 μ•ˆμ •ν™”: 쀑앙은행은 κ²½μ œκ°€ κ³Όμ—΄λ˜κ±°λ‚˜ μΉ¨μ²΄ν•˜λŠ” 것을 λ°©μ§€ν•˜κΈ° μœ„ν•΄ 톡화 정책을 μ‘°μ •ν•©λ‹ˆλ‹€. 예λ₯Ό λ“€μ–΄, κ²½μ œκ°€ κ³Όμ—΄λ˜λ©΄ 쀑앙은행은 κΈ°μ€€ 금리λ₯Ό μΈμƒν•˜μ—¬ 과열을 μ–΅μ œν•  수 μžˆμŠ΅λ‹ˆλ‹€. λ°˜λŒ€λ‘œ, κ²½μ œκ°€ μΉ¨μ²΄ν•˜λ©΄ 쀑앙은행은 κΈ°μ€€ 금리λ₯Ό μΈν•˜ν•˜μ—¬ 경제 μ„±μž₯을 촉진할 수 μžˆμŠ΅λ‹ˆλ‹€. 3. 금육 μ‹œμŠ€ν…œ 감독: 쀑앙은행은 금육 μ‹œμŠ€ν…œμ˜ μ•ˆμ •μ„±μ„ 보μž₯ν•˜κΈ° μœ„ν•΄ 은행 및 기타 금육 기관을 κ°λ…ν•˜κ³  κ·œμ œν•©λ‹ˆλ‹€. μ΄λŠ” μœ„ν—˜ 관리, 자본 μš”κ΅¬ 사항 및 감독 μš”κ±΄μ„ μ„€μ •ν•˜λŠ” 것을 ν¬ν•¨ν•©λ‹ˆλ‹€. 4. μ™Έν™˜ 관리: 쀑앙은행은 μ™Έν™˜ μ‹œμž₯을 μ•ˆμ •ν™”ν•˜κΈ° μœ„ν•΄ μ™Έν™˜ 정책을 μˆ˜λ¦½ν•˜κ³  μ‹œν–‰ν•©λ‹ˆλ‹€.