ko-gemma-2-9b-it / README.md
davidkim205's picture
Update README.md
9c3bc62 verified
---
library_name: transformers
license: llama3
language:
- ko
- en
pipeline_tag: text-generation
---
# davidkim205/ko-gemma-2-9b-it
davidkim205/ko-gemma-2-9b-it is one of several models being researched to improve the performance of Korean language models.
(would be released soon)
## Model Details
* **Model Developers** : davidkim(changyeon kim)
* **Repository** : -
* **base mode** : google/gemma-2-9b-it
* **sft dataset** : qa_ability_1851.jsonl
## Usage
### Chat Template
```
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
model_id = "davidkim205/ko-gemma-2-9b-it"
quantization_config = BitsAndBytesConfig(load_in_4bit=True)
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
quantization_config=quantization_config)
chat = [
{ "role": "system", "content":"๋‹น์‹ ์€ ์งˆ๋ฌธ์— ๋Œ€ํ•ด์„œ ์ž์„ธํžˆ ์„ค๋ช…ํ•˜๋Š” AI์ž…๋‹ˆ๋‹ค."},
{ "role": "user", "content": "๋”ฅ๋Ÿฌ๋‹์„ ์–ด๋–ป๊ฒŒ ๊ณต๋ถ€ํ•ด์•ผํ•˜๋‚˜์š”?" },
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=1024)
print(tokenizer.decode(outputs[0]))
```
output
```
`low_cpu_mem_usage` was None, now set to True since model is quantized.
Loading checkpoint shards: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 4/4 [00:04<00:00, 1.04s/it]
/home/david/anaconda3/envs/eval/lib/python3.10/site-packages/bitsandbytes/nn/modules.py:426: UserWarning: Input type into Linear4bit is torch.float16, but bnb_4bit_compute_dtype=torch.float32 (default). This will lead to slow inference or training speed.
warnings.warn(
<bos>๋‹น์‹ ์€ ์งˆ๋ฌธ์— ๋Œ€ํ•ด์„œ ์ž์„ธํžˆ ์„ค๋ช…ํ•˜๋Š” AI์ž…๋‹ˆ๋‹ค.<start_of_turn>user
๋”ฅ๋Ÿฌ๋‹์„ ์–ด๋–ป๊ฒŒ ๊ณต๋ถ€ํ•ด์•ผํ•˜๋‚˜์š”?<end_of_turn>
<start_of_turn>model
๋”ฅ๋Ÿฌ๋‹์„ ๊ณต๋ถ€ํ•˜๋Š” ๊ฒƒ์€ ํฅ๋ฏธ๋กญ๊ณ  ๋ณด๋žŒ ์žˆ๋Š” ์—ฌ์ •์ด ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค!
ํ•˜์ง€๋งŒ ์–ด๋””์„œ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•ด์•ผ ํ• ์ง€ ๋ง‰๋ง‰ํ•˜๊ฒŒ ๋Š๊ปด์งˆ ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.
๋‹ค์Œ์€ ๋”ฅ๋Ÿฌ๋‹์„ ๊ณต๋ถ€ํ•˜๊ธฐ ์œ„ํ•œ ๋‹จ๊ณ„๋ณ„ ๊ฐ€์ด๋“œ์ž…๋‹ˆ๋‹ค.
**1๋‹จ๊ณ„: ๊ธฐ์ดˆ ๋‹ค์ง€๊ธฐ**
* **์ˆ˜ํ•™**: ๋”ฅ๋Ÿฌ๋‹์˜ ๊ธฐ๋ฐ˜์ด ๋˜๋Š” ์„ ํ˜•๋Œ€์ˆ˜, ๋ฏธ์ ๋ถ„, ํ™•๋ฅ  ๋ฐ ํ†ต๊ณ„์— ๋Œ€ํ•œ ๊ธฐ๋ณธ ์ง€์‹์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. Khan Academy, Coursera ๋“ฑ ์˜จ๋ผ์ธ ํ”Œ๋žซํผ์—์„œ ์ˆ˜ํ•™ ๊ฐ•์ขŒ๋ฅผ ๋“ฃ๋Š” ๊ฒƒ์„ ์ถ”์ฒœํ•ฉ๋‹ˆ๋‹ค.
* **ํ”„๋กœ๊ทธ๋ž˜๋ฐ**: Python์€ ๋”ฅ๋Ÿฌ๋‹ ๋ถ„์•ผ์—์„œ ๊ฐ€์žฅ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” ํ”„๋กœ๊ทธ๋ž˜๋ฐ ์–ธ์–ด์ž…๋‹ˆ๋‹ค. Python ๊ธฐ์ดˆ ๋ฌธ๋ฒ•, ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ, ํ•จ์ˆ˜ ๋“ฑ์„ ์ตํžˆ์„ธ์š”. Codecademy, Google's Python Class ๋“ฑ์˜ ํ”Œ๋žซํผ์—์„œ Python์„ ๋ฐฐ์šธ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
* **๊ธฐ๋ณธ ๋จธ์‹ ๋Ÿฌ๋‹**: ๋”ฅ๋Ÿฌ๋‹์„ ์ดํ•ดํ•˜๊ธฐ ์ „์— ๊ธฐ๋ณธ์ ์ธ ๋จธ์‹ ๋Ÿฌ๋‹ ๊ฐœ๋…์„ ์ตํžˆ๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.
* ๋ถ„๋ฅ˜, ํšŒ๊ท€, ํด๋Ÿฌ์Šคํ„ฐ๋ง ๋“ฑ์˜ ๋จธ์‹ ๋Ÿฌ๋‹ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ดํ•ดํ•˜๊ณ , Scikit-learn ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์‹ค์Šต์„ ํ•ด๋ณด์„ธ์š”.
**2๋‹จ๊ณ„: ๋”ฅ๋Ÿฌ๋‹ ๊ฐœ๋… ํ•™์Šต**
* **์˜จ๋ผ์ธ ๊ฐ•์ขŒ**: Coursera, edX, Udacity ๋“ฑ์˜ ํ”Œ๋žซํผ์—์„œ ์ œ๊ณตํ•˜๋Š” ๋”ฅ๋Ÿฌ๋‹ ๊ฐ•์ขŒ๋ฅผ ์ˆ˜๊ฐ•ํ•˜์„ธ์š”. Andrew Ng์˜ Deep Learning Specialization์€ ๋”ฅ๋Ÿฌ๋‹ ๋ถ„์•ผ์˜ ๊ธฐ๋ณธ ๊ฐœ๋…์„ ํƒ„ํƒ„ํ•˜๊ฒŒ ๋‹ค์ง€๋Š” ๋ฐ ์ข‹์€ ์„ ํƒ์ž…๋‹ˆ๋‹ค.
* **์ฑ…**: ๋”ฅ๋Ÿฌ๋‹์— ๋Œ€ํ•œ ์ดํ•ด๋ฅผ ์‹ฌํ™”์‹œํ‚ค๊ธฐ ์œ„ํ•ด ์ฑ…์„ ์ฝ๋Š” ๊ฒƒ๋„ ์ข‹์€ ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.
* "Deep Learning" (Ian Goodfellow, Yoshua Bengio, Aaron Courville)์€ ๋”ฅ๋Ÿฌ๋‹ ๋ถ„์•ผ์˜ ์ „๋ฌธ๊ฐ€๋ฅผ ์œ„ํ•œ ์‹ฌ๋„ ์žˆ๋Š” ์ฑ…์ž…๋‹ˆ๋‹ค.
* "Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow" (Aurรฉlien Gรฉron)์€ ์‹ค์Šต ์ค‘์‹ฌ์œผ๋กœ ๋”ฅ๋Ÿฌ๋‹์„ ๋ฐฐ์šฐ๊ณ  ์‹ถ์€ ์‚ฌ๋žŒ์—๊ฒŒ ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค.
* **๋ธ”๋กœ๊ทธ ๋ฐ ๊ธฐ์‚ฌ**: ๋”ฅ๋Ÿฌ๋‹ ๊ด€๋ จ ์ตœ์‹  ํŠธ๋ Œ๋“œ์™€ ์—ฐ๊ตฌ ๋™ํ–ฅ์„ ํŒŒ์•…ํ•˜๊ธฐ ์œ„ํ•ด ๋ธ”๋กœ๊ทธ ๋ฐ ๊ธฐ์‚ฌ๋ฅผ ์ฝ๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค.
**3๋‹จ๊ณ„: ์‹ค์Šต ๋ฐ ํ”„๋กœ์ ํŠธ ์ง„ํ–‰**
* **๋ฐ์ดํ„ฐ์…‹**: Kaggle, UCI Machine Learning Repository ๋“ฑ์˜ ํ”Œ๋žซํผ์—์„œ ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ์…‹์„ ์ฐพ์•„ ์‹ค์Šตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
* **๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ**: TensorFlow, PyTorch, Keras ๋“ฑ์˜ ๋”ฅ๋Ÿฌ๋‹ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ๊ตฌ์ถ•ํ•˜๊ณ  ํ›ˆ๋ จํ•˜์„ธ์š”.
* **ํ”„๋กœ์ ํŠธ**: ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ์ˆ ์„ ์ ์šฉํ•˜์—ฌ ์‹ค์ œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ํ”„๋กœ์ ํŠธ๋ฅผ ์ง„ํ–‰ํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.
* ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜, ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ, ์˜ˆ์ธก ๋ชจ๋ธ ๊ฐœ๋ฐœ ๋“ฑ ๋‹ค์–‘ํ•œ ํ”„๋กœ์ ํŠธ๋ฅผ ํ†ตํ•ด ๋”ฅ๋Ÿฌ๋‹ ์‹ค๋ ฅ์„ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
**์ถ”๊ฐ€ ํŒ**
* **์ปค๋ฎค๋‹ˆํ‹ฐ ํ™œ๋™**: ๋”ฅ๋Ÿฌ๋‹ ๊ด€๋ จ ์ปค๋ฎค๋‹ˆํ‹ฐ์— ์ฐธ์—ฌํ•˜์—ฌ ๋‹ค๋ฅธ ์‚ฌ๋žŒ๋“ค๊ณผ ๊ต๋ฅ˜ํ•˜๊ณ  ์งˆ๋ฌธ์„ ํ•ด๋ณด์„ธ์š”.
* **๊พธ์ค€ํ•จ**: ๋”ฅ๋Ÿฌ๋‹์€ ๋ณต์žกํ•œ ๋ถ„์•ผ์ด๋ฏ€๋กœ ๊พธ์ค€ํžˆ ๊ณต๋ถ€ํ•˜๊ณ  ์‹ค์Šตํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.
<end_of_turn><eos>
```
## Benchmark
### kollm_evaluation
https://github.com/davidkim205/kollm_evaluation
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
|-------------------|-------|------|-----:|--------|-----:|---|------|
|kobest |N/A |none | 0|acc |0.5150|ยฑ |0.0073|
| | |none | 0|f1 |0.4494|ยฑ |N/A |
| - kobest_boolq | 1|none | 0|acc |0.6154|ยฑ |0.0130|
| | |none | 0|f1 |0.5595|ยฑ |N/A |
| - kobest_copa | 1|none | 0|acc |0.4710|ยฑ |0.0158|
| | |none | 0|f1 |0.4700|ยฑ |N/A |
| - kobest_hellaswag| 1|none | 0|acc |0.3880|ยฑ |0.0218|
| | |none | 0|f1 |0.3832|ยฑ |N/A |
| | |none | 0|acc_norm|0.4780|ยฑ |0.0224|
| - kobest_sentineg | 1|none | 0|acc |0.5189|ยฑ |0.0251|
| | |none | 0|f1 |0.4773|ยฑ |N/A |
| - kobest_wic | 1|none | 0|acc |0.4873|ยฑ |0.0141|
| | |none | 0|f1 |0.3276|ยฑ |N/A |
|ko_truthfulqa | 2|none | 0|acc |0.3390|ยฑ |0.0166|
|ko_mmlu | 1|none | 0|acc |0.1469|ยฑ |0.0019|
| | |none | 0|acc_norm|0.1469|ยฑ |0.0019|
|ko_hellaswag | 1|none | 0|acc |0.2955|ยฑ |0.0046|
| | |none | 0|acc_norm|0.3535|ยฑ |0.0048|
|ko_common_gen | 1|none | 0|acc |0.5825|ยฑ |0.0126|
| | |none | 0|acc_norm|0.5825|ยฑ |0.0126|
|ko_arc_easy | 1|none | 0|acc |0.2329|ยฑ |0.0124|
| | |none | 0|acc_norm|0.2867|ยฑ |0.0132|
### Evaluation of KEval
keval is an evaluation model that learned the prompt and dataset used in the benchmark for evaluating Korean language models among various methods of evaluating models with chatgpt to compensate for the shortcomings of the existing lm-evaluation-harness.
https://huggingface.co/davidkim205/keval-7b
| model | ned | exe_time | evalscore | count |
|:-----------------------------------------------------------------------------------------|------:|-----------:|------------:|--------:|
| claude-3-opus-20240229 | nan | nan | 8.79 | 42 |
| gpt-4-turbo-2024-04-09 | nan | nan | 8.71 | 42 |
| Qwen2-72B-Instruct | nan | 29850.5 | 7.85 | 42 |
| WizardLM-2-8x22B | nan | 133831 | 7.57 | 42 |
| ***ko-gemma-2-9b-it*** | nan | 30789.5 | 7.52 | 42 |
| HyperClovaX | nan | nan | 7.44 | 42 |
| gemma-2-9b-it | nan | 23531.7 | 7.4 | 42 |
| glm-4-9b-chat | nan | 24825.6 | 7.31 | 42 |
| Ko-Llama-3-8B-Instruct | nan | 10697.5 | 6.81 | 42 |
| Qwen2-7B-Instruct | nan | 11856.3 | 6.02 | 42 |
| Not-WizardLM-2-7B | nan | 12955.7 | 5.26 | 42 |
| gemma-1.1-7b-it | nan | 6950.5 | 4.99 | 42 |
| Mistral-7B-Instruct-v0.3 | nan | 19631.4 | 4.89 | 42 |
| Phi-3-small-128k-instruct | nan | 26747.5 | 3.52 | 42 |