--- license: gemma --- [Built with Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) Great at reasoning, but woke as fuck! This is a fine-tune over the Gemma-2-27B-it, since the base model fine-tuning is not generating coherent content. Compute sponsored by [KindoAI](https://kindo.ai). ``` You: Today is Tuesday. On Thursday last week, I had 10 apples. Then on Saturday I ate 5. I bought 8 apples, and 3 oranges yesterday. What day was yesterday? Here's how to solve this problem: * **"Thursday last week" was 5 days before Tuesday.** Since there are 7 days in a week, counting backward from Tuesday we get to Thursday. * **"Yesterday" was 2 days before today.** Since today is Tuesday, yesterday was Monday. Therefore, yesterday was Monday. You: How many apples do I have today? Let's break down the problem: * You started with 10 apples on Thursday last week. * You ate 5 apples on Saturday, leaving you with 10 - 5 = 5 apples. * You bought 8 apples yesterday, so you had 5 + 8 = 13 apples. Therefore, you have 13 apples today. You: How many oranges did I have on Saturday? You didn't have any oranges on Saturday because you bought 3 oranges yesterday. ``` # Inference Code: ```python # pip install accelerate from transformers import AutoTokenizer, AutoModelForCausalLM import torch model_path = "migtissera/Tess-v2.5-Gemma-2-27B-alpha" tokenizer = AutoTokenizer.from_pretrained(model_path) model = AutoModelForCausalLM.from_pretrained( model_path, device_map="auto", torch_dtype=torch.bfloat16 ) terminators = [ tokenizer.convert_tokens_to_ids(""), ] def generate_text(llm_prompt): inputs = tokenizer.encode(llm_prompt, add_special_tokens=False, return_tensors="pt") input_ids = inputs.to("cuda") length = len(input_ids[0]) instance = { "top_p": 1.0, "temperature": 0.75, "generate_len": 1024, "top_k": 50, } generation = model.generate( input_ids, max_length=length + instance["generate_len"], use_cache=True, do_sample=True, top_p=instance["top_p"], temperature=instance["temperature"], top_k=instance["top_k"], num_return_sequences=1, pad_token_id=tokenizer.eos_token_id, eos_token_id=terminators, ) # rest= tokenizer.decode(generation[0]) output = generation[0][length:] string = tokenizer.decode(output, skip_special_tokens=True) return f"{string}" conversation = f"""user\n""" while True: user_input = input("You: ") llm_prompt = f"{conversation}{user_input}\nmodel\n" answer = generate_text(llm_prompt) print(answer) conversation = f"{llm_prompt}{answer}\nuser\n" ```