SOLAR-10.7x2_19B / README.md
macadeliccc's picture
Update README.md
716eb12 verified
metadata
license: apache-2.0

πŸŒžπŸš€ SOLAR-10.7x2_19B

Merge of two Solar-10.7B instruct finetunes.

solar

Performs higher than mistralai/mixtral-8x7b-Instruct-v0.1

πŸŒ… Code Example

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("macadeliccc/SOLAR-math-2x10.7b",load_in_4bit=True)
model = AutoModelForCausalLM.from_pretrained(
    "macadeliccc/SOLAR-math-2x10.7b",
    device_map="auto",
    torch_dtype=torch.float16,
)

conversation = [ {'role': 'user', 'content': 'A rectangle has a length that is twice its width and its area is 50 square meters. Find the dimensions of the rectangle.'} ] 

prompt = tokenizer.apply_chat_template(conversation, tokenize=False, add_generation_prompt=True)

inputs = tokenizer(prompt, return_tensors="pt").to(model.device) 
outputs = model.generate(**inputs, use_cache=True, max_length=4096)
output_text = tokenizer.decode(outputs[0]) 
print(output_text)

Evaluations

model is currently experimental and was evaluated in 4-bit

Tasks Version Filter n-shot Metric Value Stderr
arc_challenge Yaml none 0 acc 0.5691 Β± 0.0145
none 0 acc_norm 0.5998 Β± 0.0143
arc_easy Yaml none 0 acc 0.8354 Β± 0.0076
none 0 acc_norm 0.8258 Β± 0.0078
boolq Yaml none 0 acc 0.8716 Β± 0.0059
hellaswag Yaml none 0 acc 0.6397 Β± 0.0048
none 0 acc_norm 0.8268 Β± 0.0038
openbookqa Yaml none 0 acc 0.3380 Β± 0.0212
none 0 acc_norm 0.4660 Β± 0.0223
piqa Yaml none 0 acc 0.8139 Β± 0.0091
none 0 acc_norm 0.8205 Β± 0.0090
winogrande Yaml none 0 acc 0.7609 Β± 0.0120

πŸ“š Citations

@misc{kim2023solar,
      title={SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling}, 
      author={Dahyun Kim and Chanjun Park and Sanghoon Kim and Wonsung Lee and Wonho Song and Yunsu Kim and Hyeonwoo Kim and Yungi Kim and Hyeonju Lee and Jihoo Kim and Changbae Ahn and Seonghoon Yang and Sukyung Lee and Hyunbyung Park and Gyoungjin Gim and Mikyoung Cha and Hwalsuk Lee and Sunghun Kim},
      year={2023},
      eprint={2312.15166},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}