SGEcon commited on
Commit
0e5835a
β€’
1 Parent(s): acfc617

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -24
README.md CHANGED
@@ -82,8 +82,15 @@ If you wish to use the original data, please contact the original author directl
82
 
83
 
84
  ## Training Details
85
- First, we loaded the base model quantized to 4 bits. It can significantly reduce the amount of memory required to store the model's weights and intermediate computation results, which is beneficial for deploying models in environments with limited memory resources. It can also provide faster inference speeds.
86
- Then,
 
 
 
 
 
 
 
87
 
88
 
89
  ### Training Data
@@ -95,15 +102,15 @@ If you wish to use the original data, please contact the original author directl
95
  5. μ€‘μ†Œλ²€μ²˜κΈ°μ—…λΆ€/λŒ€ν•œλ―Όκ΅­μ •λΆ€: μ€‘μ†Œλ²€μ²˜κΈ°μ—…λΆ€ μ „λ¬Έμš©μ–΄(<https://terms.naver.com/list.naver?cid=42103&categoryId=42103>)
96
  6. κ³ μ„±μ‚Ό/λ²•λ¬ΈμΆœνŒμ‚¬: νšŒκ³„Β·μ„Έλ¬΄ μš©μ–΄μ‚¬μ „(<https://terms.naver.com/list.naver?cid=51737&categoryId=51737>)
97
  7. 맨큐의 κ²½μ œν•™ 8판 Word Index
98
- 8. yanolja/KoSOLAR-10.7B-v0.2(<yanolja/KoSOLAR-10.7B-v0.2>)
99
 
100
 
101
- ### Training Procedure
102
 
103
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
104
 
105
 
106
- #### Training Hyperparameters
 
107
 
108
  |Hyperparameter|SGEcon/KoSOLAR-10.7B-v0.2_fin_v4|
109
  |------|---|
@@ -116,26 +123,12 @@ If you wish to use the original data, please contact the original author directl
116
  |lora dropout|0.05|
117
  |optim|paged_adamw_32bit|
118
  |target_modules|q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, lm_head|
119
-
120
- ## Evaluation
121
-
122
- <!-- This section describes the evaluation protocols and provides the results. -->
123
 
124
- ### Testing Data, Factors & Metrics
125
-
126
- #### Testing Data
127
-
128
- <!-- This should link to a Dataset Card if possible. -->
129
-
130
- [More Information Needed]
131
-
132
- ### Results
133
-
134
- [More Information Needed]
135
 
136
- #### Summary
137
 
 
138
 
139
- ## Citation [optional]
140
 
141
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
 
82
 
83
 
84
  ## Training Details
85
+
86
+ We use QLora to train the base model.
87
+ Quantized Low Rank Adapters (QLoRA) is an efficient technique that uses 4-bit quantized pre-trained language models to fine-tune 65 billion parameter models on a 48 GB GPU while significantly reducing memory usage.
88
+ The method uses NormalFloat 4-bit (NF4), a new data type that is theoretically optimal for normally distributed weights; Double Quantization, which further quantizes quantization constants to reduce average memory usage; and Paged Optimizers, which manage memory spikes during mini-batch processing, to increase memory efficiency without sacrificing performance.
89
+
90
+ Also, we performed instruction tuning using the data that we collected and the kyujinpy/KOR-OpenOrca-Platypus-v3 dataset on the hugging face.
91
+ Instruction tuning is learning in a supervised learning format that uses instructions and input data together as input and output data as a pair.
92
+
93
+
94
 
95
 
96
  ### Training Data
 
102
  5. μ€‘μ†Œλ²€μ²˜κΈ°μ—…λΆ€/λŒ€ν•œλ―Όκ΅­μ •λΆ€: μ€‘μ†Œλ²€μ²˜κΈ°μ—…λΆ€ μ „λ¬Έμš©μ–΄(<https://terms.naver.com/list.naver?cid=42103&categoryId=42103>)
103
  6. κ³ μ„±μ‚Ό/λ²•λ¬ΈμΆœνŒμ‚¬: νšŒκ³„Β·μ„Έλ¬΄ μš©μ–΄μ‚¬μ „(<https://terms.naver.com/list.naver?cid=51737&categoryId=51737>)
104
  7. 맨큐의 κ²½μ œν•™ 8판 Word Index
105
+ 8. kyujinpy/KOR-OpenOrca-Platypus-v3(<https://huggingface.co/datasets/kyujinpy/KOR-OpenOrca-Platypus-v3>)
106
 
107
 
108
+ The copyright of the data used belongs to the original author, so please contact the original author when using it.
109
 
 
110
 
111
 
112
+
113
+ ### Training Hyperparameters
114
 
115
  |Hyperparameter|SGEcon/KoSOLAR-10.7B-v0.2_fin_v4|
116
  |------|---|
 
123
  |lora dropout|0.05|
124
  |optim|paged_adamw_32bit|
125
  |target_modules|q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, lm_head|
 
 
 
 
126
 
127
+
 
 
 
 
 
 
 
 
 
 
128
 
129
+ ### Example
130
 
131
+ > μ€‘μ•™μ€ν–‰μ˜ 역할에 λŒ€ν•΄μ„œ μ„€λͺ…ν•΄μ€„λž˜?
132
 
133
+ >> 쀑앙은행은 ν†΅ν™”λ°œν–‰κΆŒκ³Ό κΈˆμœ΅ν†΅μ œκΆŒμ„ 가진 기관이닀. 쀑앙은행은 κ΅­κ°€μ˜ 톡화정책, μ™Έν™˜μ •μ±…, κΈˆμœ΅μ •μ±…μ„ μˆ˜λ¦½ν•˜λŠ” λ™μ‹œμ— 상업은행과 같은 κΈˆμœ΅κΈ°κ΄€μ„ κ°λ…Β·κ°λ…ν•˜λŠ” 업무λ₯Ό μˆ˜ν–‰ν•œλ‹€. 쀑앙은행은 정뢀와 상업은행에 λŒ€ν•œ μžκΈˆλŒ€λΆ€κΈ°κ΄€μ΄λ‹€. 상업은행은 쀑앙은행에 μžκΈˆμ„ λΉŒλ¦¬κ±°λ‚˜ μ˜ˆκΈˆν•œλ‹€. 쀑앙은행은 ν†΅ν™”μ‹ μš©μ •μ±…μ„ μˆ˜ν–‰ν•˜κΈ° μœ„ν•΄ κΈˆμœ΅κΈ°κ΄€μ„ 톡해 μžκΈˆμ„ λŒ€μΆœν•˜κ±°λ‚˜ 예금 λ°›λŠ”λ‹€. 쀑앙은행은 상업은행에 λŒ€ν•œ μžκΈˆλŒ€λΆ€κΈ°κ΄€μ˜ μ—­ν• κ³Ό ν•¨κ»˜ μ‹œμ€‘μ€ν–‰μ— λŒ€ν•œ κ°λ…Β·κ°λ…μ˜ 역할을 μˆ˜ν–‰ν•œλ‹€. 상업은행이 μžκΈˆμ„ λŒ€μΆœν•  λ•ŒλŠ” 1차적으둜 상업은행에 λŒ€μΆœκΈˆμ„ μ§€κΈ‰ν•˜λŠ” λŒ€μ‹ , λŒ€μΆœμ€ν–‰μ— λŒ€μΆœκΈˆμ˜ 일뢀 λ˜λŠ” 전앑을 예금으둜 λ°›μ•„ 쀑앙은행에 λˆμ„ 빌렀주고 μ˜ˆκΈˆν•œλ‹€. μ˜ˆκΈˆμ— λŒ€ν•œ μ΄μžμœ¨μ„ λ†’μ—¬ μ˜ˆκΈˆμžκ°€ 쀑앙은행에 μ˜ˆκΈˆμ„ ν•˜κ²Œλ” μœ λ„ν•˜λŠ” 것이닀. ν•œνŽΈ 상업은행은 λŒ€μΆœμ„ ν•  λ•Œ λŒ€μΆœμ€ν–‰μ΄ λŒ€μΆœκΈˆμ„ μ˜ˆκΈˆν•˜λŠ” λŒ€μ‹ , λŒ€μΆœμ„ λ°›λŠ” 은행에 λŒ€μΆœκΈˆμ„ μ§€κΈ‰ν•œλ‹€.
134