TinyWand-SFT / README.md
leaderboard-pr-bot's picture
Adding Evaluation Results
90bcce0 verified
|
raw
history blame
5.74 kB
metadata
license: apache-2.0
model-index:
  - name: TinyWand-SFT
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AI2 Reasoning Challenge (25-Shot)
          type: ai2_arc
          config: ARC-Challenge
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: acc_norm
            value: 31.4
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=maywell/TinyWand-SFT
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag (10-Shot)
          type: hellaswag
          split: validation
          args:
            num_few_shot: 10
        metrics:
          - type: acc_norm
            value: 49.96
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=maywell/TinyWand-SFT
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU (5-Shot)
          type: cais/mmlu
          config: all
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 25.98
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=maywell/TinyWand-SFT
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TruthfulQA (0-shot)
          type: truthful_qa
          config: multiple_choice
          split: validation
          args:
            num_few_shot: 0
        metrics:
          - type: mc2
            value: 43.08
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=maywell/TinyWand-SFT
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Winogrande (5-shot)
          type: winogrande
          config: winogrande_xl
          split: validation
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 55.17
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=maywell/TinyWand-SFT
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8k (5-shot)
          type: gsm8k
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 2.05
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=maywell/TinyWand-SFT
          name: Open LLM Leaderboard

TinyWand-SFT

ํ•œ๊ตญ์–ด ๋ชจ๋ธ ์„ค๋ช…

1.63B, ํ•˜์ฐฎ์€ ํฌ๊ธฐ์˜ SLM์€ ์–ด๋–จ๊นŒ์š”?

๋ชจ๋ธ ์†Œ๊ฐœ

TinyWand-SFT๋Š” 1.63B์˜ SLM ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ์ด ๋ชจ๋ธ์€ 1.63B๋ผ๋Š” ์ž‘์€ ํฌ๊ธฐ๋ฅผ ๊ฐ€์ง์œผ๋กœ์จ ์†Œํ˜•๊ธฐ๊ธฐ์—์„œ ๊ตฌ๋™๋˜๊ฑฐ๋‚˜ ํฐ toks/s๋ฅผ ๊ฐ€์งˆ ์ˆ˜ ์žˆ์Œ๊ณผ ๋™์‹œ์— ๊ฐ•๋ ฅํ•œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

๋ชจ๋ธ ๋ผ์ด์„ผ์Šค

apache-2.0

๋ชจ๋ธ ์„ฑ๋Šฅ

TBD

ํ•œ๊ณ„

์ž‘์€ ํฌ๊ธฐ๋กœ ์ธํ•˜์—ฌ Insturct ํŒŒ์ธํŠœ๋‹ ํ›„ ํ•ด๋‹น ํ…œํ”Œ๋ฆฟ์ด ์•„๋‹๊ฒฝ์šฐ ์ œ๋Œ€๋กœ ์‘๋‹ตํ•˜์ง€ ์•Š๋Š” ๋ชจ์Šต์„ ๋ณด์ž„. ํŠน์ • task์— ์‚ฌ์šฉํ•œ๋‹ค๋ฉด ํ”„๋กฌํ”„ํŒ…๋ณด๋‹ค๋Š” ํŒŒ์ธํŠœ๋‹์„ ๊ถŒ์žฅํ•จ.

๊ฐ™์€ ์ด์œ ๋กœ ์ผ๋ฐ˜์ ์ธ ๋ฒค์น˜๋งˆํฌ์—์„œ๋„ ์ƒ๋‹นํžˆ ๋‚ฎ์€ ์ ์ˆ˜๋ฅผ ๋ณด์ž„.

ํ•™์Šต ๊ณผ์ •

TBD

์‚ฌ์šฉ ์•ˆ๋‚ด

์ถ”๋ก ์— ํ•„์š”ํ•œ VRAM

์–‘์žํ™” ์ž…๋ ฅ ํ† ํฐ ์ˆ˜ ์ถœ๋ ฅ ํ† ํฐ ์ˆ˜ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰
bf16(base) 64 256 3,888 MiB
q4_K_M 64 256 1,788 MiB

ํ”„๋กฌํ”„ํŠธ ํ…œํ”Œ๋ฆฟ

๋ณธ ๋ชจ๋ธ์€ Alpaca ํ”„๋กฌํ”„ํŠธ ํ…œํ”Œ๋ฆฟ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

ํ•ด๋‹น ํ…œํ”Œ๋ฆฟ์€ apply_chat_template()๋ฅผ ํ†ตํ•ด ํ—ˆ๊น…ํŽ˜์ด์Šค ํ…œํ”Œ๋ฆฟ์—์„œ ํ™•์ธ ํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์•„๋ž˜ ํŒŒ์ด์ฌ ์ฝ”๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ๋กœ๋“œ ๋ฐ ์‚ฌ์šฉ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. transformers, torch๊ฐ€ ์‚ฌ์ „ ์„ค์น˜๋˜์–ด์•ผํ•จ

from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda" # nvidia ๊ทธ๋ž˜ํ”ฝ์นด๋“œ ๊ธฐ์ค€

tokenizer = AutoTokenizer.from_pretrained("maywell/TinyWand-SFT")
model = AutoModelForCausalLM.from_pretrained(
    "maywell/TinyWand-SFT",
    device_map="auto",
    torch_dtype=torch.bfloat16, # ์‚ฌ์šฉํ•˜๋Š” ์žฅ๋น„๊ฐ€ bfloat16์„ ์ง€์›ํ•˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ torch.float16์œผ๋กœ ๋ฐ”๊ฟ”์ฃผ์„ธ์š”.
)

messages = [
    {"role": "system", "content": "Below is an instruction that describes a task. Write a response that appropriately completes the request."}, # ๋น„์šธ ๊ฒฝ์šฐ์—๋„ ๋™์ผํ•˜๊ฒŒ ์ ์šฉ ๋จ.
    {"role": "user", "content": "์–ธ์–ด๋ชจ๋ธ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜๊ฐ€ ์ž‘์œผ๋ฉด ์–ด๋–ค ์ด์ ์ด ์žˆ์–ด?"},
]

encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")

model_inputs = encodeds.to(device)
model.to(device)

generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 34.61
AI2 Reasoning Challenge (25-Shot) 31.40
HellaSwag (10-Shot) 49.96
MMLU (5-Shot) 25.98
TruthfulQA (0-shot) 43.08
Winogrande (5-shot) 55.17
GSM8k (5-shot) 2.05