Update README.md
Browse files
README.md
CHANGED
@@ -13,11 +13,11 @@ base_model: yanolja/KoSOLAR-10.7B-v0.2
|
|
13 |
---
|
14 |
|
15 |
|
16 |
-
|
17 |
Model Developers: Sogang University SGEconFinlab(<<https://sc.sogang.ac.kr/aifinlab/>)
|
18 |
|
19 |
|
20 |
-
|
21 |
|
22 |
This model is a language model specialized in economics and finance. This was learned with various economic/finance-related data.
|
23 |
The data sources are listed below, and we are not releasing the data that we trained on because it was used for research/policy purposes.
|
@@ -65,14 +65,14 @@ If you wish to use the original data, please contact the original author directl
|
|
65 |
top_k=100 # This option is adjustable.
|
66 |
)
|
67 |
|
68 |
-
#
|
69 |
decoded = tokenizer.decode(gened[0], skip_special_tokens=True).strip()
|
70 |
|
71 |
-
# "### ๋ต๋ณ:"
|
72 |
answer_start_idx = decoded.find("### ๋ต๋ณ:") + len("### ๋ต๋ณ:")
|
73 |
complete_answer = decoded[answer_start_idx:].strip()
|
74 |
|
75 |
-
#
|
76 |
match = re.search(r"[\.\?\!][^\.\?\!]*$", complete_answer)
|
77 |
if match:
|
78 |
complete_answer = complete_answer[:match.end()].strip()
|
@@ -94,7 +94,7 @@ Instruction tuning is learning in a supervised learning format that uses instruc
|
|
94 |
|
95 |
|
96 |
|
97 |
-
|
98 |
|
99 |
1. ํ๊ตญ์ํ: ๊ฒฝ์ ๊ธ์ต์ฉ์ด 700์ (<https://www.bok.or.kr/portal/bbs/B0000249/view.do?nttId=235017&menuNo=200765>)
|
100 |
2. ๊ธ์ต๊ฐ๋
์: ๊ธ์ต์๋น์ ์ ๋ณด ํฌํธ ํ์ธ ๊ธ์ต์ฉ์ด์ฌ์ (<https://fine.fss.or.kr/fine/fnctip/fncDicary/list.do?menuNo=900021>)
|
@@ -112,7 +112,7 @@ The copyright of the data used belongs to the original author, so please contact
|
|
112 |
|
113 |
|
114 |
|
115 |
-
|
116 |
|
117 |
|Hyperparameter|SGEcon/KoSOLAR-10.7B-v0.2_fin_v4|
|
118 |
|------|---|
|
@@ -126,9 +126,14 @@ The copyright of the data used belongs to the original author, so please contact
|
|
126 |
|optim|paged_adamw_32bit|
|
127 |
|target_modules|q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, lm_head|
|
128 |
|
|
|
129 |
|
|
|
|
|
|
|
|
|
130 |
|
131 |
-
|
132 |
We only removed duplicate sentences.
|
133 |
|
134 |
|
|
|
13 |
---
|
14 |
|
15 |
|
16 |
+
## Model Details
|
17 |
Model Developers: Sogang University SGEconFinlab(<<https://sc.sogang.ac.kr/aifinlab/>)
|
18 |
|
19 |
|
20 |
+
## Model Description
|
21 |
|
22 |
This model is a language model specialized in economics and finance. This was learned with various economic/finance-related data.
|
23 |
The data sources are listed below, and we are not releasing the data that we trained on because it was used for research/policy purposes.
|
|
|
65 |
top_k=100 # This option is adjustable.
|
66 |
)
|
67 |
|
68 |
+
# Decode the generated sequence and convert it to output text
|
69 |
decoded = tokenizer.decode(gened[0], skip_special_tokens=True).strip()
|
70 |
|
71 |
+
# Extract only text after a string "### ๋ต๋ณ:"
|
72 |
answer_start_idx = decoded.find("### ๋ต๋ณ:") + len("### ๋ต๋ณ:")
|
73 |
complete_answer = decoded[answer_start_idx:].strip()
|
74 |
|
75 |
+
# Find the first punctuation mark (. ? !) and extract only up to it
|
76 |
match = re.search(r"[\.\?\!][^\.\?\!]*$", complete_answer)
|
77 |
if match:
|
78 |
complete_answer = complete_answer[:match.end()].strip()
|
|
|
94 |
|
95 |
|
96 |
|
97 |
+
## Training Data
|
98 |
|
99 |
1. ํ๊ตญ์ํ: ๊ฒฝ์ ๊ธ์ต์ฉ์ด 700์ (<https://www.bok.or.kr/portal/bbs/B0000249/view.do?nttId=235017&menuNo=200765>)
|
100 |
2. ๊ธ์ต๊ฐ๋
์: ๊ธ์ต์๋น์ ์ ๋ณด ํฌํธ ํ์ธ ๊ธ์ต์ฉ์ด์ฌ์ (<https://fine.fss.or.kr/fine/fnctip/fncDicary/list.do?menuNo=900021>)
|
|
|
112 |
|
113 |
|
114 |
|
115 |
+
## Training Hyperparameters
|
116 |
|
117 |
|Hyperparameter|SGEcon/KoSOLAR-10.7B-v0.2_fin_v4|
|
118 |
|------|---|
|
|
|
126 |
|optim|paged_adamw_32bit|
|
127 |
|target_modules|q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, lm_head|
|
128 |
|
129 |
+
|
130 |
|
131 |
+
## License
|
132 |
+
The language identification model is distributed under the Creative Commons Attribution-NonCommercial 4.0 International Public License.
|
133 |
+
|
134 |
+
|
135 |
|
136 |
+
## Example
|
137 |
We only removed duplicate sentences.
|
138 |
|
139 |
|