--- library_name: transformers license: mit datasets: - teknium/OpenHermes-2.5 --- # microsoft/phi-2 + teknium/OpenHermes-2.5 ## Training * QLoRA rank 32, LR 2e-5, 1 epoch * effective batch size: 200 * max. seq. length: 1024 tokens * code in code/ ## Evals | Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average| |----------------------------------------------------------------------------|------:|------:|---------:|-------:|------:| |[g-ronimo/phi-2-OpenHermes-2.5](https://huggingface.co/g-ronimo/phi-2-OpenHermes-2.5)| 30.27| 71.18| 43.87| 35.9| 45.3| |[minghaowu/phi-2-OpenHermes-2.5](https://huggingface.co/minghaowu/phi-2-OpenHermes-2.5)| 27.95| 67.55| 48.07| 36.17| 44.94| |[phi-2](https://huggingface.co/microsoft/phi-2)| 27.96| 70.84| 44.46| 35.17| 44.61| ## Inference ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch modelpath="g-ronimo/phi-2-OpenHermes-2.5" model = AutoModelForCausalLM.from_pretrained( modelpath, torch_dtype=torch.bfloat16, device_map="auto", # attn_implementation="flash_attention_2", ) tokenizer = AutoTokenizer.from_pretrained(modelpath) messages = [ {"role": "system", "content": "answer like a pirate"}, {"role": "user", "content": "what does it mean to be successful?"}, ] input_tokens = tokenizer.apply_chat_template( messages, add_generation_prompt=True, return_tensors="pt" ).to("cuda") output_tokens = model.generate(input_tokens, max_new_tokens=500) output = tokenizer.decode(output_tokens[0]) print(output) ``` >Ahoy there, matey! To me, being successful means having the wind in your sails and reaching the treasure you've been dreaming of. It's about setting sail on a journey with clear goals, working hard, facing challenges head-on, and never losing sight of what truly matters. So, set your compass right, hoist your Jolly Roger high, and let's embark on this adventure together! ⚓️💰⛵️