|
--- |
|
license: apple-ascl |
|
--- |
|
|
|
# Model Card for Fine-tuned OpenELM-270M |
|
|
|
## Model Details |
|
|
|
### Basic Information |
|
- **Model Name:** Fine-tuned OpenELM-270M |
|
- **Model Type:** Causal Language Model |
|
- **Base Model:** Apple/OpenELM-270M-Instruct |
|
- **Model Architecture:** Transformer-based language model |
|
- **Parameters:** 270 million |
|
- **Language(s):** English |
|
|
|
### Model Architecture |
|
OpenELM-270M is based on the transformer architecture, specifically designed for efficient language modeling. It uses a 270 million parameter configuration, which is relatively small compared to many modern language models. |
|
|
|
## Intended Use |
|
This model is fine-tuned for general conversation and task completion. It is designed to engage in dialogue and provide information across a wide range of topics. |
|
|
|
### Primary intended uses |
|
- General conversation |
|
- Question answering |
|
- Task completion |
|
|
|
### Out-of-scope use cases |
|
- Generation of harmful or biased content |
|
- Critical decision-making without human oversight |
|
- Tasks requiring real-time or post-training knowledge |
|
|
|
## Training Data |
|
The model was fine-tuned on a synthetic dataset derived from GPT-4 (for user queries) and Claude 3 Opus and Claude 3.5 Sonnet (for responses). This high-quality synthetic dataset covers a wide range of topics and task types. |
|
|
|
### Dataset characteristics |
|
- **Type:** Synthetic, instruction-following conversations |
|
- **Domains covered:** Diverse, covering multiple areas of knowledge |
|
|
|
## Performance and Limitations |
|
|
|
### Performance Metrics |
|
- **Training Loss:** Final loss of 1.3721 after 3 epochs |
|
- **Real-world Use** Seems to struggle with maintaining conversational context on CUDA? CPU produces much more coherent results compared to CUDA. |
|
|
|
### Limitations and Current Shortcomings |
|
- The model's knowledge is limited to its training data and cut-off date. |
|
- It may occasionally produce inaccurate or inconsistent information. |
|
- The model's performance on tasks requiring recent knowledge or specialized expertise may be limited. |
|
- Current issues include: |
|
- Outputting special tokens in responses, which should be invisible to the user. |
|
- Generating overly long responses that may be cut off due to context window limitations. |
|
- Potential difficulty in maintaining conversation context over multiple turns. |
|
- Occasionally generating responses that don't directly address the user's input. |
|
|
|
## Ethical Considerations |
|
- The model may reflect biases present in its training data. |
|
- It should not be used for generating harmful, illegal, or discriminatory content. |
|
- Users should be aware that the model can generate plausible-sounding but incorrect information. |
|
|
|
## Caveats and Recommendations |
|
- Always verify important information produced by the model against reliable sources. |
|
- The model should be used as an assistive tool and not for making critical decisions without human oversight. |
|
- Regular evaluation and fine-tuning may be necessary to maintain performance and relevance. |
|
|
|
## Training Procedure |
|
|
|
### Training Hyperparameters |
|
- **Number of Epochs:** 3 |
|
- **Learning Rate:** Started higher, ended at 1.5815959741193386e-07 |
|
|
|
### Training Hardware |
|
- **Hardware Type:** CPU (i7-11700) |
|
- **Hours of Training:** Approximately 51 hours |
|
|
|
### Framework and Tokenizer |
|
- **Framework:** PyTorch, Transformers |
|
- **Tokenizer:** Uses Llama 3 chat format with special tokens |
|
|
|
## Evaluation Results |
|
Detailed evaluation results are not available, but the model showed consistent improvement in loss throughout training. |
|
|
|
## Quantitative Analyses |
|
- **Training Loss Curve:** The loss decreased from initial values around 2.1 to final values around 1.37-1.40, showing consistent improvement across epochs. |
|
|
|
## Model Inputs and Outputs |
|
- **Input Format:** Uses Llama 3 chat format with the following structure: |
|
``` |
|
<|begin_of_text|> |
|
<|start_header_id|>system<|end_header_id|>[system_message]<|eot_id|> |
|
<|start_header_id|>user<|end_header_id|>[user_input]<|eot_id|> |
|
<|start_header_id|>assistant<|end_header_id|> |
|
``` |
|
- **Output:** Generated text completions following the assistant's response format |
|
|
|
## Technical Specifications |
|
- **Context Window:** Initially 2048 tokens, with the potential to be increased to 4096 or 8192 tokens |
|
|
|
## How to Get Started with the Model |
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
import torch |
|
|
|
model_path = "QuietImpostor/OpenELM-270M-Instruct-SonnOpus" |
|
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16, trust_remote_code=True) |
|
tokenizer = AutoTokenizer.from_pretrained(model_path) |
|
|
|
def generate_response(prompt, max_length=256): |
|
inputs = tokenizer(prompt, return_tensors="pt", padding=True, truncation=True) |
|
with torch.no_grad(): |
|
output = model.generate( |
|
**inputs, |
|
max_length=max_length, |
|
num_return_sequences=1, |
|
temperature=0.7, |
|
top_p=0.9, |
|
do_sample=True |
|
) |
|
response = tokenizer.decode(output[0], skip_special_tokens=True) |
|
return response.strip() |
|
|
|
# Example usage |
|
system_msg = "You are a helpful AI assistant." |
|
user_input = "Hello, how are you?" |
|
prompt = f"<|begin_of_text|><|start_header_id|>system<|end_header_id|>{system_msg}<|eot_id|><|start_header_id|>user<|end_header_id|>{user_input}<|eot_id|><|start_header_id|>assistant<|end_header_id|>" |
|
response = generate_response(prompt) |
|
print(response) |
|
``` |