Model Card for Fine-tuned OpenELM-270M

Model Details

Basic Information

Model Name: Fine-tuned OpenELM-270M
Model Type: Causal Language Model
Base Model: Apple/OpenELM-270M-Instruct
Model Architecture: Transformer-based language model
Parameters: 270 million
Language(s): English

Model Architecture

OpenELM-270M is based on the transformer architecture, specifically designed for efficient language modeling. It uses a 270 million parameter configuration, which is relatively small compared to many modern language models.

Intended Use

This model is fine-tuned for general conversation and task completion. It is designed to engage in dialogue and provide information across a wide range of topics.

Primary intended uses

General conversation
Question answering
Task completion

Out-of-scope use cases

Generation of harmful or biased content
Critical decision-making without human oversight
Tasks requiring real-time or post-training knowledge

Training Data

The model was fine-tuned on a synthetic dataset derived from GPT-4 (for user queries) and Claude 3 Opus and Claude 3.5 Sonnet (for responses). This high-quality synthetic dataset covers a wide range of topics and task types.

Dataset characteristics

Type: Synthetic, instruction-following conversations
Domains covered: Diverse, covering multiple areas of knowledge

Performance and Limitations

Performance Metrics

Training Loss: Final loss of 1.3721 after 3 epochs
Real-world Use Seems to struggle with maintaining conversational context on CUDA? CPU produces much more coherent results compared to CUDA.

Limitations and Current Shortcomings

The model's knowledge is limited to its training data and cut-off date.
It may occasionally produce inaccurate or inconsistent information.
The model's performance on tasks requiring recent knowledge or specialized expertise may be limited.
Current issues include:
- Outputting special tokens in responses, which should be invisible to the user.
- Generating overly long responses that may be cut off due to context window limitations.
- Potential difficulty in maintaining conversation context over multiple turns.
- Occasionally generating responses that don't directly address the user's input.

Ethical Considerations

The model may reflect biases present in its training data.
It should not be used for generating harmful, illegal, or discriminatory content.
Users should be aware that the model can generate plausible-sounding but incorrect information.

Caveats and Recommendations

Always verify important information produced by the model against reliable sources.
The model should be used as an assistive tool and not for making critical decisions without human oversight.
Regular evaluation and fine-tuning may be necessary to maintain performance and relevance.

Training Procedure

Training Hyperparameters

Number of Epochs: 3
Learning Rate: Started higher, ended at 1.5815959741193386e-07

Training Hardware

Hardware Type: CPU (i7-11700)
Hours of Training: Approximately 51 hours

Framework and Tokenizer

Framework: PyTorch, Transformers
Tokenizer: Uses Llama 3 chat format with special tokens

Evaluation Results

Detailed evaluation results are not available, but the model showed consistent improvement in loss throughout training.

Quantitative Analyses

Training Loss Curve: The loss decreased from initial values around 2.1 to final values around 1.37-1.40, showing consistent improvement across epochs.

Model Inputs and Outputs

Input Format: Uses Llama 3 chat format with the following structure:

<|begin_of_text|>
<|start_header_id|>system<|end_header_id|>[system_message]<|eot_id|>
<|start_header_id|>user<|end_header_id|>[user_input]<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>

Output: Generated text completions following the assistant's response format

Technical Specifications

Context Window: Initially 2048 tokens, with the potential to be increased to 4096 or 8192 tokens

How to Get Started with the Model

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_path = "QuietImpostor/OpenELM-270M-Instruct-SonnOpus"
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_path)

def generate_response(prompt, max_length=256):
    inputs = tokenizer(prompt, return_tensors="pt", padding=True, truncation=True)
    with torch.no_grad():
        output = model.generate(
            **inputs,
            max_length=max_length,
            num_return_sequences=1,
            temperature=0.7,
            top_p=0.9,
            do_sample=True
        )
    response = tokenizer.decode(output[0], skip_special_tokens=True)
    return response.strip()

# Example usage
system_msg = "You are a helpful AI assistant."
user_input = "Hello, how are you?"
prompt = f"<|begin_of_text|><|start_header_id|>system<|end_header_id|>{system_msg}<|eot_id|><|start_header_id|>user<|end_header_id|>{user_input}<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
response = generate_response(prompt)
print(response)

QuietImpostor
/

OpenELM-270M-Instruct-SonnOpus