|
--- |
|
library_name: peft |
|
base_model: meta-llama/Llama-2-7b-chat-hf |
|
--- |
|
|
|
# Model Card for Model ID |
|
## euneeei/hw-llama-2-7B-nsmc |
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
|
|
|
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. --> |
|
- ### ํ๊ตญ์ด๋ก ๋ ๋ค์ด๋ฒ ์ํ ๋ฆฌ๋ทฐ ๋ฐ์ดํฐ์
์
๋๋ค |
|
|
|
|
|
- ## train dataset : 3000๊ฐ |
|
- ## test dataset : 1000๊ฐ |
|
|
|
- ## ํ์ต ๊ฒฐ๊ณผ ์ต๋ 0.87 accuracy |
|
|
|
## **1. midm์ผ๋ก ์ ํ๋ 0.91 ๋์๋ @dataclassํ๋ผ๋ฏธํฐ๊ทธ๋๋ก** |
|
- ### learning_rate : 2e-4 |
|
|
|
| | precision | recall | f1-score | support| |
|
|----|----|----|-------|------| |
|
negative| 0.81 | 0.91 | 0.85 | 492 |
|
positive | 0.90 | 0.79 | 0.84 | 508 |
|
accuracy | | | 0.85 | 1000 |
|
macro avg | 0.85 | 0.85 | 0.85 | 1000 |
|
weighted avg | 0.85 | 0.85 | 0.85 | 1000 |
|
|
|
|
|
- ### confusion Matrix: |
|
### [[ 446, 46 ] |
|
### [106, 402]] |
|
|
|
- ### accuracy 0.85์ผ๋ก 0.90์ ๋ชป ๋ฏธ์ถ์ด, learning rate๋ฅผ ๋ ์กฐ์ ํ๊ธฐ๋ก ํ์ต๋๋ค. ๋ํ ์ค์ ๋ก๋ '๊ธ์ '์ธ๋ฐ '๋ถ์ '์ผ๋ก ํ๋จํ ๊ฒฝ์ฐ๊ฐ ๋๊ฒ ๋์์ต๋๋ค. |
|
|
|
## **2. learning_rate 2e-4 -> 1e-4๋ก ๋ณ๊ฒฝ** |
|
|
|
- ### learning_rate : 1e-4 |
|
|
|
|
|
| | precision | recall | f1-score | support| |
|
|----|----|----|-------|------| |
|
negative| 0.82 | 0.88 | 0.85 | 492 |
|
positive | 0.87 | 0.81 | 0.84 | 508 |
|
accuracy | | | 0.84 | 1000 |
|
macro avg | 0.84 | 0.84 | 0.84 | 1000 |
|
weighted avg | 0.84 | 0.84 | 0.84 | 1000 |
|
|
|
|
|
- ### confusion Matrix: |
|
### [[ 431, 61 ] |
|
### [96, 412]] |
|
|
|
- ### ํ์ต๋ฅ ๋ณ๊ฒฝ์ ๋ณด๋ค ์ ๋ฐ์ ์ผ๋ก ์ข์์ง์ง ์์์ต๋๋ค. ๋ฐ๋ผ์ ํ์ต๋ฅ ์ ๋์ฌ๋ณด๊ธฐ๋ก ๊ฒฐ์ ํ์ต๋๋ค. |
|
|
|
## **3. learning_rate 1e-4 -> 4e-4๋ก ๋ณ๊ฒฝ** |
|
|
|
- ### learning_rate 1e-4์ ํฌ๊ฒ ๋ฌ๋ผ์ง ์ ์ด ์์์ต๋๋ค. ๊ทธ๋์ ๋ค๋ฅธ ๊ฒ์ ์กฐ์ ์ ํ๊ธฐ๋ก ํ์ต๋๋ค. |
|
|
|
## **4. ๋ฐฐ์น ์ฌ์ด์ฆ๋ฅผ ์ฆ๊ฐ.** |
|
|
|
- ### ๋ฉ๋ชจ๋ฆฌ ์ด์๋ก script_args์ seq_length = 450์ผ๋ก ์ค์์ต๋๋ค. |
|
|
|
- ### ๊ทธ๋ฌ๋ ๊ณ์ ๋ฉ๋ชจ๋ฆฌ ๋ถ์กฑ์ผ๋ก ํ์ต ๋ถ๊ฐ |
|
per_device_train_batch_size=1 |
|
->per_device_train_batch_size=2 |
|
per_device_eval_batch_size=1, |
|
->per_device_eval_batch_size=2 |
|
|
|
|
|
## **5. gradient_accumulation_steps ์ฆ๊ฐ** |
|
|
|
- ### ๋ฐฐ์น ์ฌ์ด์ฆ ์ฆ๊ฐ ๋์ gradient accumulation step ๋ณ๊ฒฝํ๊ธฐ๋ก ํจ. |
|
- ### ๋ฉ๋ชจ๋ฆฌ ๋ถ์กฑ ์๋ฐฉ์ผ๋ก script_args์ seq_length = 450์ผ๋ก ์ค์ |
|
gradient_accumulation_steps=2, |
|
-> gradient_accumulation_steps=4 |
|
-> gradient_accumulation_steps=8 |
|
|
|
|
|
|
|
| | precision | recall | f1-score | support| |
|
|----|----|----|-------|------| |
|
negative| 0.85 | 0.88 | 0.87 | 492 |
|
positive | 0.88 | 0.85 | 0.87 | 508 |
|
accuracy | | | 0.87 | 1000 |
|
macro avg | 0.87 | 0.87 | 0.87 | 1000 |
|
weighted avg | 0.87 | 0.87 | 0.87 | 1000 |
|
|
|
|
|
- ### confusion Matrix: |
|
### [[ 435, 57 ] |
|
### [77, 431]] |
|
|
|
- ### ์ ํ๋ 0.90์ ๋๊ธฐ์ง๋ ๋ชปํ์ง๋ง, "๋ถ์ "์ ๋ง์ถ๋ ๋น์จ์ด ๋ง์์ก์ต๋๋ค. |
|
|
|
## **6. weight_decay ๊ฐ์, learning_rate ์ฆ๊ฐ** |
|
|
|
weight_decay=0.03 |
|
-> weight_decay=0.01 |
|
learning_rate=4e-4 |
|
-> learning_rate=5e-4 |
|
|
|
|
|
|
|
| | precision | recall | f1-score | support| |
|
|----|----|----|-------|------| |
|
negative| 0.85 | 0.89 | 0.87 | 492 |
|
positive | 0.89 | 0.85 | 0.87 | 508 |
|
accuracy | | | 0.87 | 1000 |
|
macro avg | 0.87 | 0.87 | 0.87 | 1000 |
|
weighted avg | 0.87 | 0.87 | 0.87 | 1000 |
|
|
|
|
|
- ### ๊ฒฐ๊ณผ : 0.87, 5๋ฒ๊ณผ ๊ฑฐ์ ์ฐจ์ด๊ฐ ์์ต๋๋ค. |
|
|
|
## **7. max_step ์ ํ ์์ ๊ธฐ** |
|
|
|
| | precision | recall | f1-score | support| |
|
|----|----|----|-------|------| |
|
negative| 0.86 | 0.89 | 0.87 | 492 |
|
positive | 0.89 | 0.86 | 0.87 | 508 |
|
accuracy | | | 0.87 | 1000 |
|
macro avg | 0.87 | 0.87 | 0.87 | 1000 |
|
weighted avg | 0.87 | 0.87 | 0.87 | 1000 |
|
|
|
|
|
- ### confusion Matrix: |
|
### [[ 436, 56 ] |
|
### [70, 438]] |
|
|
|
-### ์์ฃผ ์กฐ๊ธ์ฉ ๋ ์ ํํด์ง๊ณ ์์ผ๋, ์ ํ๋ 0.87์์ ํฐ ๋ณํ๊ฐ ์์ต๋๋ค. |
|
|
|
## **8. learning rate ๋ ์ค์ด๊ธฐ |
|
| | precision | recall | f1-score | support| |
|
|----|----|----|-------|------| |
|
negative| 0.84 | 0.90 | 0.87 | 492 |
|
positive | 0.89 | 0.84 | 0.86 | 508 |
|
accuracy | | | 0.87 | 1000 |
|
macro avg | 0.87 | 0.87 | 0.87 | 1000 |
|
weighted avg | 0.87 | 0.87 | 0.87 | 1000 |
|
|
|
- ### confusion Matrix: |
|
### [[ 441, 51 ] |
|
### [83, 425]] |
|
|
|
- ### ์ด์ ๋ณด๋ค '๊ธ์ '์ ๋ ์ ๋ง์ถ์ง๋ง, '๋ถ์ '์ ๋ง์ถ๋ ๊ฒฝ์ฐ๊ฐ ์ค์ด๋ค์์ต๋๋ค. |
|
- ### ๊ฒฐ๊ณผ์ ์ผ๋ก ์ ํ๋ 0.87์ผ๋ก ํ์ต์ ๋ง์น๊ฒ ์ต๋๋ค. |
|
|
|
|
|
### Framework versions |
|
|
|
- PEFT 0.7.1 |