metadata
library_name: peft
base_model: meta-llama/Llama-2-7b-chat-hf
Model Card for Model ID
euneeei/hw-llama-2-7B-nsmc
Training Details
Training Data
ํ๊ตญ์ด๋ก ๋ ๋ค์ด๋ฒ ์ํ ๋ฆฌ๋ทฐ ๋ฐ์ดํฐ์ ์ ๋๋ค
train dataset : 3000๊ฐ
test dataset : 1000๊ฐ
ํ์ต ๊ฒฐ๊ณผ ์ต๋ 0.87 accuracy
1. midm์ผ๋ก ์ ํ๋ 0.91 ๋์๋ @dataclassํ๋ผ๋ฏธํฐ๊ทธ๋๋ก
precision | recall | f1-score | support | |
---|---|---|---|---|
negative | 0.81 | 0.91 | 0.85 | 492 |
positive | 0.90 | 0.79 | 0.84 | 508 |
accuracy | 0.85 | 1000 | ||
macro avg | 0.85 | 0.85 | 0.85 | 1000 |
weighted avg | 0.85 | 0.85 | 0.85 | 1000 |
confusion Matrix:
[[ 446, 46 ]
[106, 402]]
accuracy 0.85์ผ๋ก 0.90์ ๋ชป ๋ฏธ์ถ์ด, learning rate๋ฅผ ๋ ์กฐ์ ํ๊ธฐ๋ก ํ์ต๋๋ค. ๋ํ ์ค์ ๋ก๋ '๊ธ์ '์ธ๋ฐ '๋ถ์ '์ผ๋ก ํ๋จํ ๊ฒฝ์ฐ๊ฐ ๋๊ฒ ๋์์ต๋๋ค.
2. learning_rate 2e-4 -> 1e-4๋ก ๋ณ๊ฒฝ
precision | recall | f1-score | support | |
---|---|---|---|---|
negative | 0.82 | 0.88 | 0.85 | 492 |
positive | 0.87 | 0.81 | 0.84 | 508 |
accuracy | 0.84 | 1000 | ||
macro avg | 0.84 | 0.84 | 0.84 | 1000 |
weighted avg | 0.84 | 0.84 | 0.84 | 1000 |
confusion Matrix:
[[ 431, 61 ]
[96, 412]]
ํ์ต๋ฅ ๋ณ๊ฒฝ์ ๋ณด๋ค ์ ๋ฐ์ ์ผ๋ก ์ข์์ง์ง ์์์ต๋๋ค. ๋ฐ๋ผ์ ํ์ต๋ฅ ์ ๋์ฌ๋ณด๊ธฐ๋ก ๊ฒฐ์ ํ์ต๋๋ค.
3. learning_rate 1e-4 -> 4e-4๋ก ๋ณ๊ฒฝ
learning_rate 1e-4์ ํฌ๊ฒ ๋ฌ๋ผ์ง ์ ์ด ์์์ต๋๋ค. ๊ทธ๋์ ๋ค๋ฅธ ๊ฒ์ ์กฐ์ ์ ํ๊ธฐ๋ก ํ์ต๋๋ค.
4. ๋ฐฐ์น ์ฌ์ด์ฆ๋ฅผ ์ฆ๊ฐ.
๋ฉ๋ชจ๋ฆฌ ์ด์๋ก script_args์ seq_length = 450์ผ๋ก ์ค์์ต๋๋ค.
๊ทธ๋ฌ๋ ๊ณ์ ๋ฉ๋ชจ๋ฆฌ ๋ถ์กฑ์ผ๋ก ํ์ต ๋ถ๊ฐ
per_device_train_batch_size=1 ->per_device_train_batch_size=2 per_device_eval_batch_size=1, ->per_device_eval_batch_size=2
5. gradient_accumulation_steps ์ฆ๊ฐ
๋ฐฐ์น ์ฌ์ด์ฆ ์ฆ๊ฐ ๋์ gradient accumulation step ๋ณ๊ฒฝํ๊ธฐ๋ก ํจ.
๋ฉ๋ชจ๋ฆฌ ๋ถ์กฑ ์๋ฐฉ์ผ๋ก script_args์ seq_length = 450์ผ๋ก ์ค์
gradient_accumulation_steps=2, -> gradient_accumulation_steps=4 -> gradient_accumulation_steps=8
precision | recall | f1-score | support | |
---|---|---|---|---|
negative | 0.85 | 0.88 | 0.87 | 492 |
positive | 0.88 | 0.85 | 0.87 | 508 |
accuracy | 0.87 | 1000 | ||
macro avg | 0.87 | 0.87 | 0.87 | 1000 |
weighted avg | 0.87 | 0.87 | 0.87 | 1000 |
6. weight_decay ๊ฐ์, learning_rate ์ฆ๊ฐ
weight_decay=0.03
-> weight_decay=0.01
learning_rate=4e-4
-> learning_rate=5e-4
precision | recall | f1-score | support | |
---|---|---|---|---|
negative | 0.85 | 0.89 | 0.87 | 492 |
positive | 0.89 | 0.85 | 0.87 | 508 |
accuracy | 0.87 | 1000 | ||
macro avg | 0.87 | 0.87 | 0.87 | 1000 |
weighted avg | 0.87 | 0.87 | 0.87 | 1000 |
7. max_step ์ ํ ์์ ๊ธฐ
precision | recall | f1-score | support | |
---|---|---|---|---|
negative | 0.86 | 0.89 | 0.87 | 492 |
positive | 0.89 | 0.86 | 0.87 | 508 |
accuracy | 0.87 | 1000 | ||
macro avg | 0.87 | 0.87 | 0.87 | 1000 |
weighted avg | 0.87 | 0.87 | 0.87 | 1000 |
-### ์์ฃผ ์กฐ๊ธ์ฉ ๋ ์ ํํด์ง๊ณ ์์ผ๋, ์ ํ๋ 0.87์์ ํฐ ๋ณํ๊ฐ ์์ต๋๋ค.
**8. learning rate ๋ ์ค์ด๊ธฐ
precision | recall | f1-score | support | |
---|---|---|---|---|
negative | 0.84 | 0.90 | 0.87 | 492 |
positive | 0.89 | 0.84 | 0.86 | 508 |
accuracy | 0.87 | 1000 | ||
macro avg | 0.87 | 0.87 | 0.87 | 1000 |
weighted avg | 0.87 | 0.87 | 0.87 | 1000 |
confusion Matrix:
[[ 441, 51 ]
[83, 425]]
์ด์ ๋ณด๋ค '๊ธ์ '์ ๋ ์ ๋ง์ถ์ง๋ง, '๋ถ์ '์ ๋ง์ถ๋ ๊ฒฝ์ฐ๊ฐ ์ค์ด๋ค์์ต๋๋ค.
๊ฒฐ๊ณผ์ ์ผ๋ก ์ ํ๋ 0.87์ผ๋ก ํ์ต์ ๋ง์น๊ฒ ์ต๋๋ค.
Framework versions
- PEFT 0.7.1