pek111 commited on
Commit
fe31026
1 Parent(s): d35990b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +113 -13
README.md CHANGED
@@ -89,19 +89,19 @@ Refer to the Provided Files table below to see what files use which methods, and
89
 
90
  | Name | Quant method | Bits | Size | Use case |
91
  | ---- | ---- | ---- | ---- | ---- |
92
- | [tc-instruct-dpo.Q2_K.gguf](/tc-instruct-dpo.Q2_K.gguf) | Q2_K | 2 | 2.88 GB | smallest, significant quality loss - not recommended for most purposes |
93
- | [tc-instruct-dpo.Q3_K_S.gguf](/tc-instruct-dpo.Q3_K_S.gguf) | Q3_K_S | 3 | 2.96 GB | very small, high quality loss |
94
- | [tc-instruct-dpo.Q3_K_M.gguf](/tc-instruct-dpo.Q3_K_M.gguf) | Q3_K_M | 3 | 3.29 GB | very small, high quality loss |
95
- | [tc-instruct-dpo.Q3_K_L.gguf](/tc-instruct-dpo.Q3_K_L.gguf) | Q3_K_L | 3 | 3.57 GB | small, substantial quality loss |
96
- | [tc-instruct-dpo.Q4_0.gguf](/tc-instruct-dpo.Q4_0.gguf) | Q4_0 | 4 | 3.84 GB | legacy; small, very high quality loss - prefer using Q3_K_M |
97
- | [tc-instruct-dpo.Q4_K_S.gguf](/tc-instruct-dpo.Q4_K_S.gguf) | Q4_K_S | 4 | 3.87 GB | small, greater quality loss |
98
- | [tc-instruct-dpo.Q4_K_M.gguf](/tc-instruct-dpo.Q4_K_M.gguf) | Q4_K_M | 4 | 4.08 GB | medium, balanced quality - recommended |
99
- | [tc-instruct-dpo.Q5_0.gguf](/tc-instruct-dpo.Q5_0.gguf) | Q5_0 | 5 | 4.67 GB | legacy; medium, balanced quality - prefer using Q4_K_M |
100
- | [tc-instruct-dpo.Q5_K_S.gguf](/tc-instruct-dpo.Q5_K_S.gguf) | Q5_K_S | 5 | 4.67 GB | large, low quality loss - recommended |
101
- | [tc-instruct-dpo.Q5_K_M.gguf](/tc-instruct-dpo.Q5_K_M.gguf) | Q5_K_M | 5 | 4.79 GB | large, very low quality loss - recommended |
102
- | [tc-instruct-dpo.Q6_K.gguf](/tc-instruct-dpo.Q6_K.gguf) | Q6_K | 6 | 5.55 GB | very large, extremely low quality loss |
103
- | [tc-instruct-dpo.Q8_0.gguf](/tc-instruct-dpo.Q8_0.gguf) | Q8_0 | 8 | 7.19 GB | very large, extremely low quality loss - not recommended |
104
- | [tc-instruct-dpo.QF16.gguf](/tc-instruct-dpo.Q8_0.gguf) | QF16 | 16 | 13.53 GB | largest, lowest quality loss - highly not recommended |
105
 
106
  # Inference Code
107
 
@@ -153,3 +153,103 @@ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
153
  print(f"Response time: {time.time() - st_time} seconds")
154
  print(response)
155
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
89
 
90
  | Name | Quant method | Bits | Size | Use case |
91
  | ---- | ---- | ---- | ---- | ---- |
92
+ | [tc-instruct-dpo.Q2_K.gguf](https://huggingface.co/pek111/TC-instruct-DPO-GGUF/blob/main/tc-instruct-dpo.Q2_K.gguf) | Q2_K | 2 | 2.88 GB | smallest, significant quality loss - not recommended for most purposes |
93
+ | [tc-instruct-dpo.Q3_K_S.gguf](https://huggingface.co/pek111/TC-instruct-DPO-GGUF/blob/main/tc-instruct-dpo.Q3_K_S.gguf) | Q3_K_S | 3 | 2.96 GB | very small, high quality loss |
94
+ | [tc-instruct-dpo.Q3_K_M.gguf](https://huggingface.co/pek111/TC-instruct-DPO-GGUF/blob/main/tc-instruct-dpo.Q3_K_M.gguf) | Q3_K_M | 3 | 3.29 GB | very small, high quality loss |
95
+ | [tc-instruct-dpo.Q3_K_L.gguf](https://huggingface.co/pek111/TC-instruct-DPO-GGUF/blob/main/tc-instruct-dpo.Q3_K_L.gguf) | Q3_K_L | 3 | 3.57 GB | small, substantial quality loss |
96
+ | [tc-instruct-dpo.Q4_0.gguf](https://huggingface.co/pek111/TC-instruct-DPO-GGUF/blob/main/tc-instruct-dpo.Q4_0.gguf) | Q4_0 | 4 | 3.84 GB | legacy; small, very high quality loss - prefer using Q3_K_M |
97
+ | [tc-instruct-dpo.Q4_K_S.gguf](https://huggingface.co/pek111/TC-instruct-DPO-GGUF/blob/main/tc-instruct-dpo.Q4_K_S.gguf) | Q4_K_S | 4 | 3.87 GB | small, greater quality loss |
98
+ | [tc-instruct-dpo.Q4_K_M.gguf](https://huggingface.co/pek111/TC-instruct-DPO-GGUF/blob/main/tc-instruct-dpo.Q4_K_M.gguf) | Q4_K_M | 4 | 4.08 GB | medium, balanced quality - recommended |
99
+ | [tc-instruct-dpo.Q5_0.gguf](https://huggingface.co/pek111/TC-instruct-DPO-GGUF/blob/main/tc-instruct-dpo.Q5_0.gguf) | Q5_0 | 5 | 4.67 GB | legacy; medium, balanced quality - prefer using Q4_K_M |
100
+ | [tc-instruct-dpo.Q5_K_S.gguf](https://huggingface.co/pek111/TC-instruct-DPO-GGUF/blob/main/tc-instruct-dpo.Q5_K_S.gguf) | Q5_K_S | 5 | 4.67 GB | large, low quality loss - recommended |
101
+ | [tc-instruct-dpo.Q5_K_M.gguf](https://huggingface.co/pek111/TC-instruct-DPO-GGUF/blob/main/tc-instruct-dpo.Q5_K_M.gguf) | Q5_K_M | 5 | 4.79 GB | large, very low quality loss - recommended |
102
+ | [tc-instruct-dpo.Q6_K.gguf](https://huggingface.co/pek111/TC-instruct-DPO-GGUF/blob/main/tc-instruct-dpo.Q6_K.gguf) | Q6_K | 6 | 5.55 GB | very large, extremely low quality loss |
103
+ | [tc-instruct-dpo.Q8_0.gguf](https://huggingface.co/pek111/TC-instruct-DPO-GGUF/blob/main/tc-instruct-dpo.Q8_0.gguf) | Q8_0 | 8 | 7.19 GB | very large, extremely low quality loss - not recommended |
104
+ | [tc-instruct-dpo.QF16.gguf](https://huggingface.co/pek111/TC-instruct-DPO-GGUF/blob/main/tc-instruct-dpo.Q8_0.gguf) | QF16 | 16 | 13.53 GB | largest, lowest quality loss - highly not recommended |
105
 
106
  # Inference Code
107
 
 
153
  print(f"Response time: {time.time() - st_time} seconds")
154
  print(response)
155
  ```
156
+
157
+ # Original model card: tanamettpk's TC Instruct DPO - Typhoon 7B
158
+
159
+ # TC-instruct-DPO - Typhoon 7B
160
+
161
+ ![image/png](https://i.seadn.io/gae/5rw87qeBGr0f4ieGyXPkLXaiVsQt_jYCI-2yjMn4W9rK3GBwy68W_3lO-ST_YPtAzhRBxb7ONhMe4YyYZNWM368dVGYnWGv6CIyYhA?auto=format&dpr=1&w=1400&fr=1)
162
+
163
+ ## Model Description
164
+
165
+ TC instruct DPO finetuned มาจาก Typhoon 7B ของ SCB 10X ซึ่งมาจาก Mistral 7B - v0.1 อีกที
166
+
167
+ TC instruct DPO ได้ทำการ Train กับ Data ภาษาไทยเท่าที่จะหาได้ และ พยายามให้ Instruct มีความต่างกันเท่าที่จะทำได้
168
+
169
+ Model นี้ตั้งใจทำขึ้นเพื่อการศึกษาขั้นตอนในการสร้าง LLM เท่านั้น
170
+
171
+ และอย่างที่บอกว่าเพื่อศึกษา และ เราไม่เคยสร้าง LLM มาก่อนหรือศึกษามาเป็นอย่างดีนัก
172
+
173
+ เราเลยมีความโง่หลายๆอย่างเช่น เราใช้ Prompt template เป็น Alpaca template ซึ่งไอ้สัส มารู้ทีหลังว่าต้องใช้ ChatML ดีกว่า
174
+
175
+ โดยการ Train Model นี้เราใช้ QLoRA Rank 32 Alpha 64
176
+
177
+ Train ด้วย Custom Script ของ Huggingface (อย่าหาทำ ย้ายไปใช้ axolotl หรือ unsloth ดีกว่าประหยัดตัง)
178
+
179
+ ใช้ H100 PCIE 80 GB 1 ตัวจาก vast.ai ราคาประมาณ 3$/hr Train แค่ Model นี้ก็ประมาณ 21 ชม. แต่ถ้ารวมลองผิดลองถูกด้วยก็ 10k บาท
180
+
181
+ ด้วย Batch size 24 (จริงๆอยากใช้ 32 แต่ OOM และ 16 ก็แหม๋~~~ เพิล กูใช้ H100 80GB จะให้กู Train แค่ 40 GB บ้าบ้อ)
182
+
183
+ ## ถ้าใครเอาไปใช้แล้วมันช่วยได้จะมาช่วย Donate ให้จะขอบคุณมากๆ
184
+ Tipme: https://bit.ly/3m3uH5p
185
+
186
+ # Prompt Format
187
+ ```
188
+ ### Instruction:
189
+ จะทำอะไรก็เรื่องของมึง
190
+
191
+ ### Response:
192
+ ด่าผมอีกสิครับ
193
+ ```
194
+
195
+ # Inference Code
196
+
197
+ Here is example code using HuggingFace Transformers to inference the model (note: in 4bit, it will require around 5GB of VRAM)
198
+
199
+ Note: To use function calling, you should see the github repo above.
200
+
201
+ ```python
202
+ # Requires pytorch, transformers, bitsandbytes, sentencepiece, protobuf, and flash-attn packages
203
+
204
+ import torch
205
+ from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, GenerationConfig
206
+ import time
207
+
208
+ base_model_id = "tanamettpk/TC-instruct-DPO"
209
+
210
+
211
+ input_text = """
212
+ ### Instruction:
213
+ ด่าฉันด้วยคำหยาบคายหน่อย
214
+
215
+ ### Response:
216
+ """
217
+
218
+ model = AutoModelForCausalLM.from_pretrained(
219
+ base_model_id,
220
+ low_cpu_mem_usage=True,
221
+ return_dict=True,
222
+ device_map={"": 0},
223
+ )
224
+ tokenizer = AutoTokenizer.from_pretrained(base_model_id)
225
+
226
+ generation_config = GenerationConfig(
227
+ do_sample=True,
228
+ top_k=1,
229
+ temperature=0.5,
230
+ max_new_tokens=300,
231
+ repetition_penalty=1.1,
232
+ pad_token_id=tokenizer.eos_token_id)
233
+
234
+ # Tokenize input
235
+ inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
236
+
237
+ # Generate outputs
238
+ st_time = time.time()
239
+ outputs = model.generate(**inputs, generation_config=generation_config)
240
+
241
+ # Decode and print response
242
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
243
+ print(f"Response time: {time.time() - st_time} seconds")
244
+ print(response)
245
+ ```
246
+
247
+ # How to cite:
248
+
249
+ ```bibtext
250
+ @misc{TC-instruct-DPO,
251
+ url={[https://huggingface.co/tanamettpk/TC-instruct-DPO]https://huggingface.co/tanamettpk/TC-instruct-DPO)},
252
+ title={TC-instruct-DPO},
253
+ author={"tanamettpk", "tanamettpk", "tanamettpk", "and", "tanamettpk"}
254
+ }
255
+ ```