maywell
/

TinyWand-SFT

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

TinyWand-SFT / README.md

maywell's picture

Update README.md

880f96c 10 months ago

|

2.49 kB

	---
	license: cc-by-nc-4.0
	---

	# TinyWand-SFT
	<p align="left">
	<img src="./TinyWand.png" width="150"/>
	<p>

	# 한국어 모델 설명

	1.63B, 하찮은 크기의 SLM은 어떨까요?

	## 모델 소개
	TinyWand-SFT는 1.63B의 SLM 모델입니다. 이 모델은 1.63B라는 작은 크기를 가짐으로써 소형기기에서 구동되거나 큰 toks/s를 가질 수 있음과 동시에 강력한 성능을 보여줍니다.

	## 모델 라이센스
	현재 모델은 상업적 이용 불가인 cc-by-nc-4.0의 라이센스를 적용받고 있으며, 이는 해당 모델을 weight를 이용한 파인튜닝, Continual-사전학습 모델에도 동일하게 적용됩니다.

	라이센스는 무료 혹은 조건부로 며칠 후 수정 될 예정입니다.

	## 모델 성능
	TBD

	## 학습 과정
	TBD

	## 사용 안내

	추론에 필요한 VRAM
	\| 양자화 \| 입력 토큰 수 \| 출력 토큰 수 \| 메모리 사용량 \|
	\|---\|---\|---\|---\|
	\| bf16(base) \| 64 \| 256 \| 3,888 MiB \|
	\| q4_K_M \| 64 \| 256 \| 1,788 MiB \|

	프롬프트 템플릿

	본 모델은 Alpaca 프롬프트 템플릿을 사용합니다.

	해당 템플릿은 `apply_chat_template()`를 통해 [허깅페이스 템플릿](https://huggingface.co/docs/transformers/main/chat_templating)에서 확인 하실 수 있습니다.

	아래 파이썬 코드를 사용하여 모델을 로드 및 사용 할 수 있습니다.
	transformers, torch가 사전 설치되어야함

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	device = "cuda" # nvidia 그래픽카드 기준

	tokenizer = AutoTokenizer.from_pretrained("maywell/TinyWand-SFT")
	model = AutoModelForCausalLM.from_pretrained(
	"maywell/TinyWand-SFT",
	device_map="auto",
	torch_dtype=torch.bfloat16, # 사용하는 장비가 bfloat16을 지원하지 않는 경우 torch.float16으로 바꿔주세요.
	)

	messages = [
	{"role": "system", "content": "Below is an instruction that describes a task. Write a response that appropriately completes the request."}, # 비울 경우에도 동일하게 적용 됨.
	{"role": "user", "content": "언어모델의 파라미터 수가 작으면 어떤 이점이 있어?"},
	]

	encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")

	model_inputs = encodeds.to(device)
	model.to(device)

	generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
	decoded = tokenizer.batch_decode(generated_ids)
	print(decoded[0])
	```