Model Card for Fine-tuned OpenELM-270M

Model Details

Basic Information

  • Model Name: Fine-tuned OpenELM-270M
  • Model Type: Causal Language Model
  • Base Model: Apple/OpenELM-270M-Instruct
  • Model Architecture: Transformer-based language model
  • Parameters: 270 million
  • Language(s): English

Model Architecture

OpenELM-270M is based on the transformer architecture, specifically designed for efficient language modeling. It uses a 270 million parameter configuration, which is relatively small compared to many modern language models.

Intended Use

This model is fine-tuned for general conversation and task completion. It is designed to engage in dialogue and provide information across a wide range of topics.

Primary intended uses

  • General conversation
  • Question answering
  • Task completion

Out-of-scope use cases

  • Generation of harmful or biased content
  • Critical decision-making without human oversight
  • Tasks requiring real-time or post-training knowledge

Training Data

The model was fine-tuned on a synthetic dataset derived from GPT-4 (for user queries) and Claude 3 Opus and Claude 3.5 Sonnet (for responses). This high-quality synthetic dataset covers a wide range of topics and task types.

Dataset characteristics

  • Type: Synthetic, instruction-following conversations
  • Domains covered: Diverse, covering multiple areas of knowledge

Performance and Limitations

Performance Metrics

  • Training Loss: Final loss of 1.3721 after 3 epochs
  • Real-world Use Seems to struggle with maintaining conversational context on CUDA? CPU produces much more coherent results compared to CUDA.

Limitations and Current Shortcomings

  • The model's knowledge is limited to its training data and cut-off date.
  • It may occasionally produce inaccurate or inconsistent information.
  • The model's performance on tasks requiring recent knowledge or specialized expertise may be limited.
  • Current issues include:
    • Outputting special tokens in responses, which should be invisible to the user.
    • Generating overly long responses that may be cut off due to context window limitations.
    • Potential difficulty in maintaining conversation context over multiple turns.
    • Occasionally generating responses that don't directly address the user's input.

Ethical Considerations

  • The model may reflect biases present in its training data.
  • It should not be used for generating harmful, illegal, or discriminatory content.
  • Users should be aware that the model can generate plausible-sounding but incorrect information.

Caveats and Recommendations

  • Always verify important information produced by the model against reliable sources.
  • The model should be used as an assistive tool and not for making critical decisions without human oversight.
  • Regular evaluation and fine-tuning may be necessary to maintain performance and relevance.

Training Procedure

Training Hyperparameters

  • Number of Epochs: 3
  • Learning Rate: Started higher, ended at 1.5815959741193386e-07

Training Hardware

  • Hardware Type: CPU (i7-11700)
  • Hours of Training: Approximately 51 hours

Framework and Tokenizer

  • Framework: PyTorch, Transformers
  • Tokenizer: Uses Llama 3 chat format with special tokens

Evaluation Results

Detailed evaluation results are not available, but the model showed consistent improvement in loss throughout training.

Quantitative Analyses

  • Training Loss Curve: The loss decreased from initial values around 2.1 to final values around 1.37-1.40, showing consistent improvement across epochs.

Model Inputs and Outputs

  • Input Format: Uses Llama 3 chat format with the following structure:
    <|begin_of_text|>
    <|start_header_id|>system<|end_header_id|>[system_message]<|eot_id|>
    <|start_header_id|>user<|end_header_id|>[user_input]<|eot_id|>
    <|start_header_id|>assistant<|end_header_id|>
    
  • Output: Generated text completions following the assistant's response format

Technical Specifications

  • Context Window: Initially 2048 tokens, with the potential to be increased to 4096 or 8192 tokens

How to Get Started with the Model

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_path = "QuietImpostor/OpenELM-270M-Instruct-SonnOpus"
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_path)

def generate_response(prompt, max_length=256):
    inputs = tokenizer(prompt, return_tensors="pt", padding=True, truncation=True)
    with torch.no_grad():
        output = model.generate(
            **inputs,
            max_length=max_length,
            num_return_sequences=1,
            temperature=0.7,
            top_p=0.9,
            do_sample=True
        )
    response = tokenizer.decode(output[0], skip_special_tokens=True)
    return response.strip()

# Example usage
system_msg = "You are a helpful AI assistant."
user_input = "Hello, how are you?"
prompt = f"<|begin_of_text|><|start_header_id|>system<|end_header_id|>{system_msg}<|eot_id|><|start_header_id|>user<|end_header_id|>{user_input}<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
response = generate_response(prompt)
print(response)
Downloads last month
21
Safetensors
Model size
272M params
Tensor type
F32
·
Inference Examples
Inference API (serverless) does not yet support model repos that contain custom code.

Model tree for QuietImpostor/OpenELM-270M-Instruct-SonnOpus

Quantizations
1 model