Update README.md

7dce2a2 verified 7 months ago

5.37 kB

	---
	license: apple-ascl
	---

	# Model Card for Fine-tuned OpenELM-270M

	## Model Details

	### Basic Information
	- Model Name: Fine-tuned OpenELM-270M
	- Model Type: Causal Language Model
	- Base Model: Apple/OpenELM-270M-Instruct
	- Model Architecture: Transformer-based language model
	- Parameters: 270 million
	- Language(s): English

	### Model Architecture
	OpenELM-270M is based on the transformer architecture, specifically designed for efficient language modeling. It uses a 270 million parameter configuration, which is relatively small compared to many modern language models.

	## Intended Use
	This model is fine-tuned for general conversation and task completion. It is designed to engage in dialogue and provide information across a wide range of topics.

	### Primary intended uses
	- General conversation
	- Question answering
	- Task completion

	### Out-of-scope use cases
	- Generation of harmful or biased content
	- Critical decision-making without human oversight
	- Tasks requiring real-time or post-training knowledge

	## Training Data
	The model was fine-tuned on a synthetic dataset derived from GPT-4 (for user queries) and Claude 3 Opus and Claude 3.5 Sonnet (for responses). This high-quality synthetic dataset covers a wide range of topics and task types.

	### Dataset characteristics
	- Type: Synthetic, instruction-following conversations
	- Domains covered: Diverse, covering multiple areas of knowledge

	## Performance and Limitations

	### Performance Metrics
	- Training Loss: Final loss of 1.3721 after 3 epochs
	- Real-world Use Seems to struggle with maintaining conversational context on CUDA? CPU produces much more coherent results compared to CUDA.

	### Limitations and Current Shortcomings
	- The model's knowledge is limited to its training data and cut-off date.
	- It may occasionally produce inaccurate or inconsistent information.
	- The model's performance on tasks requiring recent knowledge or specialized expertise may be limited.
	- Current issues include:
	- Outputting special tokens in responses, which should be invisible to the user.
	- Generating overly long responses that may be cut off due to context window limitations.
	- Potential difficulty in maintaining conversation context over multiple turns.
	- Occasionally generating responses that don't directly address the user's input.

	## Ethical Considerations
	- The model may reflect biases present in its training data.
	- It should not be used for generating harmful, illegal, or discriminatory content.
	- Users should be aware that the model can generate plausible-sounding but incorrect information.

	## Caveats and Recommendations
	- Always verify important information produced by the model against reliable sources.
	- The model should be used as an assistive tool and not for making critical decisions without human oversight.
	- Regular evaluation and fine-tuning may be necessary to maintain performance and relevance.

	## Training Procedure

	### Training Hyperparameters
	- Number of Epochs: 3
	- Learning Rate: Started higher, ended at 1.5815959741193386e-07

	### Training Hardware
	- Hardware Type: CPU (i7-11700)
	- Hours of Training: Approximately 51 hours

	### Framework and Tokenizer
	- Framework: PyTorch, Transformers
	- Tokenizer: Uses Llama 3 chat format with special tokens

	## Evaluation Results
	Detailed evaluation results are not available, but the model showed consistent improvement in loss throughout training.

	## Quantitative Analyses
	- Training Loss Curve: The loss decreased from initial values around 2.1 to final values around 1.37-1.40, showing consistent improvement across epochs.

	## Model Inputs and Outputs
	- Input Format: Uses Llama 3 chat format with the following structure:
	```
	<\|begin_of_text\|>
	<\|start_header_id\|>system<\|end_header_id\|>[system_message]<\|eot_id\|>
	<\|start_header_id\|>user<\|end_header_id\|>[user_input]<\|eot_id\|>
	<\|start_header_id\|>assistant<\|end_header_id\|>
	```
	- Output: Generated text completions following the assistant's response format

	## Technical Specifications
	- Context Window: Initially 2048 tokens, with the potential to be increased to 4096 or 8192 tokens

	## How to Get Started with the Model
	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	model_path = "QuietImpostor/OpenELM-270M-Instruct-SonnOpus"
	model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16, trust_remote_code=True)
	tokenizer = AutoTokenizer.from_pretrained(model_path)

	def generate_response(prompt, max_length=256):
	inputs = tokenizer(prompt, return_tensors="pt", padding=True, truncation=True)
	with torch.no_grad():
	output = model.generate(
	**inputs,
	max_length=max_length,
	num_return_sequences=1,
	temperature=0.7,
	top_p=0.9,
	do_sample=True
	)
	response = tokenizer.decode(output[0], skip_special_tokens=True)
	return response.strip()

	# Example usage
	system_msg = "You are a helpful AI assistant."
	user_input = "Hello, how are you?"
	prompt = f"<\|begin_of_text\|><\|start_header_id\|>system<\|end_header_id\|>{system_msg}<\|eot_id\|><\|start_header_id\|>user<\|end_header_id\|>{user_input}<\|eot_id\|><\|start_header_id\|>assistant<\|end_header_id\|>"
	response = generate_response(prompt)
	print(response)
	```