Update README.md

428e242 about 1 year ago

4.64 kB

	---
	library_name: peft
	base_model: meta-llama/Llama-2-7b-chat-hf
	---

	# Model Card for Model ID
	## euneeei/hw-llama-2-7B-nsmc
	<!-- Provide a quick summary of what the model is/does. -->




	## Training Details

	### Training Data

	<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
	- ### 한국어로 된 네이버 영화 리뷰 데이터셋입니다


	- ## train dataset : 3000개
	- ## test dataset : 1000개

	- ## 학습 결과 최대 0.87 accuracy

	## 1. midm으로 정확도 0.91 나왔던 @dataclass파라미터그대로
	- ### learning_rate : 2e-4

	\| \| precision \| recall \| f1-score \| support\|
	\|----\|----\|----\|-------\|------\|
	negative\| 0.81 \| 0.91 \| 0.85 \| 492
	positive \| 0.90 \| 0.79 \| 0.84 \| 508
	accuracy \| \| \| 0.85 \| 1000
	macro avg \| 0.85 \| 0.85 \| 0.85 \| 1000
	weighted avg \| 0.85 \| 0.85 \| 0.85 \| 1000


	- ### confusion Matrix:
	### [[ 446, 46 ]
	### [106, 402]]

	- ### accuracy 0.85으로 0.90에 못 미추어, learning rate를 더 조절하기로 했습니다. 또한 실제로는 '긍정'인데 '부정'으로 판단한 경우가 높게 나왔습니다.

	## 2. learning_rate 2e-4 -> 1e-4로 변경

	- ### learning_rate : 1e-4


	\| \| precision \| recall \| f1-score \| support\|
	\|----\|----\|----\|-------\|------\|
	negative\| 0.82 \| 0.88 \| 0.85 \| 492
	positive \| 0.87 \| 0.81 \| 0.84 \| 508
	accuracy \| \| \| 0.84 \| 1000
	macro avg \| 0.84 \| 0.84 \| 0.84 \| 1000
	weighted avg \| 0.84 \| 0.84 \| 0.84 \| 1000


	- ### confusion Matrix:
	### [[ 431, 61 ]
	### [96, 412]]

	- ### 학습률 변경전보다 전반적으로 좋아지지 않았습니다. 따라서 학습률을 높여보기로 결정했습니다.

	## 3. learning_rate 1e-4 -> 4e-4로 변경

	- ### learning_rate 1e-4와 크게 달라진 점이 없었습니다. 그래서 다른 것을 조정을 하기로 했습니다.

	## 4. 배치 사이즈를 증가.

	- ### 메모리 이슈로 script_args의 seq_length = 450으로 줄였습니다.

	- ### 그러나 계속 메모리 부족으로 학습 불가
	per_device_train_batch_size=1
	->per_device_train_batch_size=2
	per_device_eval_batch_size=1,
	->per_device_eval_batch_size=2


	## 5. gradient_accumulation_steps 증가

	- ### 배치 사이즈 증가 대신 gradient accumulation step 변경하기로 함.
	- ### 메모리 부족 예방으로 script_args의 seq_length = 450으로 줄임
	gradient_accumulation_steps=2,
	-> gradient_accumulation_steps=4
	-> gradient_accumulation_steps=8



	\| \| precision \| recall \| f1-score \| support\|
	\|----\|----\|----\|-------\|------\|
	negative\| 0.85 \| 0.88 \| 0.87 \| 492
	positive \| 0.88 \| 0.85 \| 0.87 \| 508
	accuracy \| \| \| 0.87 \| 1000
	macro avg \| 0.87 \| 0.87 \| 0.87 \| 1000
	weighted avg \| 0.87 \| 0.87 \| 0.87 \| 1000


	- ### confusion Matrix:
	### [[ 435, 57 ]
	### [77, 431]]

	- ### 정확도 0.90을 넘기지는 못했지만, "부정"을 맞추는 비율이 많아졌습니다.

	## 6. weight_decay 감소, learning_rate 증가

	weight_decay=0.03
	-> weight_decay=0.01
	learning_rate=4e-4
	-> learning_rate=5e-4



	\| \| precision \| recall \| f1-score \| support\|
	\|----\|----\|----\|-------\|------\|
	negative\| 0.85 \| 0.89 \| 0.87 \| 492
	positive \| 0.89 \| 0.85 \| 0.87 \| 508
	accuracy \| \| \| 0.87 \| 1000
	macro avg \| 0.87 \| 0.87 \| 0.87 \| 1000
	weighted avg \| 0.87 \| 0.87 \| 0.87 \| 1000


	- ### 결과 : 0.87, 5번과 거의 차이가 없습니다.

	## 7. max_step 제한 없애기

	\| \| precision \| recall \| f1-score \| support\|
	\|----\|----\|----\|-------\|------\|
	negative\| 0.86 \| 0.89 \| 0.87 \| 492
	positive \| 0.89 \| 0.86 \| 0.87 \| 508
	accuracy \| \| \| 0.87 \| 1000
	macro avg \| 0.87 \| 0.87 \| 0.87 \| 1000
	weighted avg \| 0.87 \| 0.87 \| 0.87 \| 1000


	- ### confusion Matrix:
	### [[ 436, 56 ]
	### [70, 438]]

	-### 아주 조금씩 더 정확해지고 있으나, 정확도 0.87에서 큰 변화가 없습니다.

	## **8. learning rate 더 줄이기
	\| \| precision \| recall \| f1-score \| support\|
	\|----\|----\|----\|-------\|------\|
	negative\| 0.84 \| 0.90 \| 0.87 \| 492
	positive \| 0.89 \| 0.84 \| 0.86 \| 508
	accuracy \| \| \| 0.87 \| 1000
	macro avg \| 0.87 \| 0.87 \| 0.87 \| 1000
	weighted avg \| 0.87 \| 0.87 \| 0.87 \| 1000

	- ### confusion Matrix:
	### [[ 441, 51 ]
	### [83, 425]]

	- ### 이전보다 '긍정'을 더 잘 맞추지만, '부정'을 맞추는 경우가 줄어들었습니다.
	- ### 결과적으로 정확도 0.87으로 학습을 마치겠습니다.


	### Framework versions

	- PEFT 0.7.1