SGEcon commited on
Commit
96dfd64
โ€ข
1 Parent(s): f0e5c4b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -8
README.md CHANGED
@@ -13,11 +13,11 @@ base_model: yanolja/KoSOLAR-10.7B-v0.2
13
  ---
14
 
15
 
16
- # Model Details
17
  Model Developers: Sogang University SGEconFinlab(<<https://sc.sogang.ac.kr/aifinlab/>)
18
 
19
 
20
- ### Model Description
21
 
22
  This model is a language model specialized in economics and finance. This was learned with various economic/finance-related data.
23
  The data sources are listed below, and we are not releasing the data that we trained on because it was used for research/policy purposes.
@@ -65,14 +65,14 @@ If you wish to use the original data, please contact the original author directl
65
  top_k=100 # This option is adjustable.
66
  )
67
 
68
- # ์ƒ์„ฑ๋œ ์‹œํ€€์Šค๋ฅผ ๋””์ฝ”๋“œํ•˜์—ฌ ์ถœ๋ ฅ ํ…์ŠคํŠธ๋กœ ๋ณ€ํ™˜
69
  decoded = tokenizer.decode(gened[0], skip_special_tokens=True).strip()
70
 
71
- # "### ๋‹ต๋ณ€:" ๋ฌธ์ž์—ด ์ดํ›„์˜ ํ…์ŠคํŠธ๋งŒ ์ถ”์ถœ
72
  answer_start_idx = decoded.find("### ๋‹ต๋ณ€:") + len("### ๋‹ต๋ณ€:")
73
  complete_answer = decoded[answer_start_idx:].strip()
74
 
75
- # ์ฒซ ๋ฒˆ์งธ ๊ตฌ๋‘์ (. ? !)์„ ์ฐพ์•„์„œ ๊ทธ ๋ถ€๋ถ„๊นŒ์ง€๋งŒ ์ถ”์ถœ
76
  match = re.search(r"[\.\?\!][^\.\?\!]*$", complete_answer)
77
  if match:
78
  complete_answer = complete_answer[:match.end()].strip()
@@ -94,7 +94,7 @@ Instruction tuning is learning in a supervised learning format that uses instruc
94
 
95
 
96
 
97
- ### Training Data
98
 
99
  1. ํ•œ๊ตญ์€ํ–‰: ๊ฒฝ์ œ๊ธˆ์œต์šฉ์–ด 700์„ (<https://www.bok.or.kr/portal/bbs/B0000249/view.do?nttId=235017&menuNo=200765>)
100
  2. ๊ธˆ์œต๊ฐ๋…์›: ๊ธˆ์œต์†Œ๋น„์ž ์ •๋ณด ํฌํ„ธ ํŒŒ์ธ ๊ธˆ์œต์šฉ์–ด์‚ฌ์ „(<https://fine.fss.or.kr/fine/fnctip/fncDicary/list.do?menuNo=900021>)
@@ -112,7 +112,7 @@ The copyright of the data used belongs to the original author, so please contact
112
 
113
 
114
 
115
- ### Training Hyperparameters
116
 
117
  |Hyperparameter|SGEcon/KoSOLAR-10.7B-v0.2_fin_v4|
118
  |------|---|
@@ -126,9 +126,14 @@ The copyright of the data used belongs to the original author, so please contact
126
  |optim|paged_adamw_32bit|
127
  |target_modules|q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, lm_head|
128
 
 
129
 
 
 
 
 
130
 
131
- ### Example
132
  We only removed duplicate sentences.
133
 
134
 
 
13
  ---
14
 
15
 
16
+ ## Model Details
17
  Model Developers: Sogang University SGEconFinlab(<<https://sc.sogang.ac.kr/aifinlab/>)
18
 
19
 
20
+ ## Model Description
21
 
22
  This model is a language model specialized in economics and finance. This was learned with various economic/finance-related data.
23
  The data sources are listed below, and we are not releasing the data that we trained on because it was used for research/policy purposes.
 
65
  top_k=100 # This option is adjustable.
66
  )
67
 
68
+ # Decode the generated sequence and convert it to output text
69
  decoded = tokenizer.decode(gened[0], skip_special_tokens=True).strip()
70
 
71
+ # Extract only text after a string "### ๋‹ต๋ณ€:"
72
  answer_start_idx = decoded.find("### ๋‹ต๋ณ€:") + len("### ๋‹ต๋ณ€:")
73
  complete_answer = decoded[answer_start_idx:].strip()
74
 
75
+ # Find the first punctuation mark (. ? !) and extract only up to it
76
  match = re.search(r"[\.\?\!][^\.\?\!]*$", complete_answer)
77
  if match:
78
  complete_answer = complete_answer[:match.end()].strip()
 
94
 
95
 
96
 
97
+ ## Training Data
98
 
99
  1. ํ•œ๊ตญ์€ํ–‰: ๊ฒฝ์ œ๊ธˆ์œต์šฉ์–ด 700์„ (<https://www.bok.or.kr/portal/bbs/B0000249/view.do?nttId=235017&menuNo=200765>)
100
  2. ๊ธˆ์œต๊ฐ๋…์›: ๊ธˆ์œต์†Œ๋น„์ž ์ •๋ณด ํฌํ„ธ ํŒŒ์ธ ๊ธˆ์œต์šฉ์–ด์‚ฌ์ „(<https://fine.fss.or.kr/fine/fnctip/fncDicary/list.do?menuNo=900021>)
 
112
 
113
 
114
 
115
+ ## Training Hyperparameters
116
 
117
  |Hyperparameter|SGEcon/KoSOLAR-10.7B-v0.2_fin_v4|
118
  |------|---|
 
126
  |optim|paged_adamw_32bit|
127
  |target_modules|q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, lm_head|
128
 
129
+
130
 
131
+ ## License
132
+ The language identification model is distributed under the Creative Commons Attribution-NonCommercial 4.0 International Public License.
133
+
134
+
135
 
136
+ ## Example
137
  We only removed duplicate sentences.
138
 
139