Nexusflow
/

Athene-V2-Chat

@@ -16,8 +16,8 @@ tags:
 </p>
-We introduce Athene-V2-Chat-72B, an open-weights LLM that rivals GPT-4o across benchmarks. It is trained through RLHF based off Qwen-2.5-72B.
-Athene-V2-Chat-72B excels in chat, math and coding. Its sister model, [Athene-V2-Agent-72B](https://huggingface.co/Nexusflow/Athene-V2-Chat), surpasses GPT-4o in complex function calling and agent applications.
 Benchmark performance:
@@ -27,12 +27,13 @@ Benchmark performance:
 - **Developed by:** The Nexusflow Team
 - **Model type:** Chat Model
-- **Finetuned from model:** [Qwen 2.5 72B](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct)
 - **License**: [Nexusflow Research License](https://huggingface.co/Nexusflow/Athene-V2-Chat/blob/main/Nexusflow_Research_License_.pdf)
 - **Blog**: https://nexusflow.ai/blogs/athene-V2
 ## Usage
 Athene-V2-Chat uses the same chat template as Qwen 2.5 72B. Below is an example simple usage using the Transformers library.
 ```Python
 from transformers import AutoModelForCausalLM, AutoTokenizer
@@ -45,21 +46,25 @@ model = AutoModelForCausalLM.from_pretrained(
 )
 tokenizer = AutoTokenizer.from_pretrained(model_name)
-prompt = "Give me a short introduction to large language model."
 messages = [
     {"role": "user", "content": prompt}
 ]
 text = tokenizer.apply_chat_template(
     messages,
     tokenize=False,
     add_generation_prompt=True
 )
 model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
 generated_ids = model.generate(
     **model_inputs,
-    max_new_tokens=512
 )
 generated_ids = [
     output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
 ]
@@ -67,7 +72,7 @@ generated_ids = [
 response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
 ```
-We found that by adding system prompts that enforce the model to think step by step, the model can do even better in math and problems like counting `r`s in strawberry. For fairness consideration we **do not** include such system prompt during chat evaluation.
 ## Acknowledgment
 We would like to thank the [LMSYS Organization](https://lmsys.org/) for their support of testing the model. We would like to thank Qwen Team and the open source community for their efforts in providing the datasets and base models.

 </p>
+We introduce Athene-V2-Chat-72B, an open-weights LLM on-par with GPT-4o across benchmarks. It is trained through RLHF with Qwen-2.5-72B-Instruct as base model.
+Athene-V2-Chat-72B excels in chat, math, and coding. Its sister model, [Athene-V2-Agent-72B](https://huggingface.co/Nexusflow/Athene-V2-Chat), surpasses GPT-4o in complex function calling and agentic applications.
 Benchmark performance:
 - **Developed by:** The Nexusflow Team
 - **Model type:** Chat Model
+- **Finetuned from model:** [Qwen 2.5 72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct)
 - **License**: [Nexusflow Research License](https://huggingface.co/Nexusflow/Athene-V2-Chat/blob/main/Nexusflow_Research_License_.pdf)
 - **Blog**: https://nexusflow.ai/blogs/athene-V2
 ## Usage
 Athene-V2-Chat uses the same chat template as Qwen 2.5 72B. Below is an example simple usage using the Transformers library.
 ```Python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 )
 tokenizer = AutoTokenizer.from_pretrained(model_name)
+prompt = "Write a Python function to return the nth Fibonacci number in log n runtime."
 messages = [
     {"role": "user", "content": prompt}
 ]
 text = tokenizer.apply_chat_template(
     messages,
     tokenize=False,
     add_generation_prompt=True
 )
 model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
 generated_ids = model.generate(
     **model_inputs,
+    max_new_tokens=2048
 )
 generated_ids = [
     output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
 ]
 response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
 ```
+Note that by adding a system prompt that encourages the model to think step by step, the model can improve further on difficult math queries and problems like counting `r`s in strawberry. For fairness consideration we **do not** include such system prompt during chat evaluation.
 ## Acknowledgment
 We would like to thank the [LMSYS Organization](https://lmsys.org/) for their support of testing the model. We would like to thank Qwen Team and the open source community for their efforts in providing the datasets and base models.