Header

Athena-1: Lightweight and Powerful Instruction-Following Model

Athena-1 is a fine-tuned, instruction-following large language model derived from Qwen/Qwen2.5-7B-Instruct. Designed to balance efficiency and performance, Athena 7B provides powerful text-generation capabilities, making it suitable for a variety of real-world applications, including conversational AI, content creation, and structured data processing.


Key Features

πŸš€ Enhanced Performance

  • Instruction Following: Fine-tuned for excellent adherence to user prompts and instructions.
  • Coding and Mathematics: Proficient in solving coding problems and mathematical reasoning.
  • Lightweight: At 7.62 billion parameters, Athena-1-7B offers powerful performance while maintaining efficiency.

πŸ“– Long-Context Understanding

  • Context Length: Supports up to 128K tokens, ensuring accurate handling of large documents or conversations.
  • Token Generation: Can generate up to 8K tokens of output.

🌍 Multilingual Support

  • Supports 29+ languages, including:
    • English, Chinese, French, Spanish, Portuguese, German, Italian, Russian
    • Japanese, Korean, Vietnamese, Thai, Arabic, and more.

πŸ“Š Structured Data & Outputs

  • Structured Data Interpretation: Understands and processes structured formats like tables and JSON.
  • Structured Output Generation: Generates well-formatted outputs, including JSON and other structured formats.

Model Details

  • Base Model: Qwen/Qwen2.5-7B-Instruct
  • Architecture: Transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias.
  • Parameters: 7.62B total (6.53B non-embedding).
  • Layers: 28
  • Attention Heads: 28 for Q, 4 for KV.
  • Context Length: Up to 131,072 tokens.

Applications

Athena-1 is designed for a broad range of use cases:

  • Conversational AI: Create natural, human-like chatbot experiences.
  • Code Generation: Generate, debug, or explain code snippets.
  • Mathematical Problem Solving: Assist with complex calculations and reasoning.
  • Document Processing: Summarize or analyze large documents.
  • Multilingual Applications: Support for diverse languages for translation and global use cases.
  • Structured Data: Process and generate structured data, including tables and JSON.

Quickstart

Here’s how you can use Athena 7B for quick text generation:

# Use a pipeline as a high-level helper
from transformers import pipeline

messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe = pipeline("text-generation", model="Spestly/Athena-1-7B")
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Spestly/Athena-1-7B")
model = AutoModelForCausalLM.from_pretrained("Spestly/Athena-1-7B")
Downloads last month
1
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Spestly/Athena-1-7B

Base model

Qwen/Qwen2.5-7B
Finetuned
(139)
this model
Quantizations
2 models

Collection including Spestly/Athena-1-7B