Model Card for Evaluate360M
Model Details
Model Description
Evaluate360M is a lightweight large language model optimized for reasoning tasks. It is designed to run efficiently on low-end commercial hardware, such as mobile phones, while maintaining strong performance in logical reasoning and general-purpose applications.
- Developed by: [More Information Needed]
- Funded by [optional]: [More Information Needed]
- Shared by [optional]: [More Information Needed]
- Model type: Transformer-based decoder model
- Language(s) (NLP): English
- License: [More Information Needed]
- Finetuned from model [optional]:
HuggingFaceTB/SmolLM2-360M-Instruct
Model Sources
- Repository: [More Information Needed]
- Paper [optional]: [More Information Needed]
- Demo [optional]: [More Information Needed]
Uses
Direct Use
Evaluate360M is intended for general-purpose reasoning tasks and can be used in applications that require lightweight LLMs, such as:
- Mobile-based AI assistants
- Low-power embedded systems
- Edge computing applications
Downstream Use
It can be further fine-tuned for specific domains, including code generation, summarization, or dialogue systems.
Out-of-Scope Use
- Not optimized for handling very large context windows
- Not designed for generating high-fidelity creative text, such as poetry or fiction
Bias, Risks, and Limitations
Limitations
- Struggles with handling large context windows.
- Not evaluated for potential biases yet.
Recommendations
Users should be aware of the model’s limitations in context length and should evaluate its performance for their specific use cases.
How to Get Started with the Model
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "evaluate360m"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
inputs = tokenizer("What is the capital of France?", return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0]))
Training Details
Training Data
- Dataset:
HuggingFaceH4/Bespoke-Stratos-17k
- Preprocessing: Token packing enabled (
--packing
), sequence length up to 2048 tokens
Training Procedure
- Optimizer & Precision:
bf16
mixed precisiongradient_accumulation_steps = 8
- Gradient checkpointing enabled
- Hyperparameters:
- Learning rate:
2e-5
- Epochs:
3
- Batch size:
4
(per device, both training and evaluation)
- Learning rate:
- Evaluation & Saving:
- Evaluation every
500
steps - Model checkpoint saved every
1000
steps, keeping a max of2
checkpoints
- Evaluation every
Compute Infrastructure
- Hardware Used: A100 GPU
- Training Time: 6 hours
Evaluation
- Benchmarks: No evaluation conducted yet.
- Metrics: Not available yet.
Environmental Impact
- Hardware Type: A100 GPU
- Hours Used: 6 hours
- Cloud Provider: [More Information Needed]
- Compute Region: [More Information Needed]
- Carbon Emitted: [More Information Needed]
Technical Specifications
Model Architecture
- Similar to SmolLM2-360M
- Inspired by MobileLLM
- Uses Grouped-Query Attention (GQA)
- Prioritizes depth over width
Citation [optional]
BibTeX:
[More Information Needed]
APA:
[More Information Needed]
More Information
[More Information Needed]
Model Card Authors [optional]
[More Information Needed]
Model Card Contact
[More Information Needed]
- Downloads last month
- 55