Model Card for Evaluate360M

Model Details

Model Description

Evaluate360M is a lightweight large language model optimized for reasoning tasks. It is designed to run efficiently on low-end commercial hardware, such as mobile phones, while maintaining strong performance in logical reasoning and general-purpose applications.

Developed by: [More Information Needed]
Funded by [optional]: [More Information Needed]
Shared by [optional]: [More Information Needed]
Model type: Transformer-based decoder model
Language(s) (NLP): English
License: [More Information Needed]
Finetuned from model [optional]: HuggingFaceTB/SmolLM2-360M-Instruct

Model Sources

Repository: [More Information Needed]
Paper [optional]: [More Information Needed]
Demo [optional]: [More Information Needed]

Uses

Direct Use

Evaluate360M is intended for general-purpose reasoning tasks and can be used in applications that require lightweight LLMs, such as:

Mobile-based AI assistants
Low-power embedded systems
Edge computing applications

Downstream Use

It can be further fine-tuned for specific domains, including code generation, summarization, or dialogue systems.

Out-of-Scope Use

Not optimized for handling very large context windows
Not designed for generating high-fidelity creative text, such as poetry or fiction

Bias, Risks, and Limitations

Limitations

Struggles with handling large context windows.
Not evaluated for potential biases yet.

Recommendations

Users should be aware of the model’s limitations in context length and should evaluate its performance for their specific use cases.

How to Get Started with the Model

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "evaluate360m"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

inputs = tokenizer("What is the capital of France?", return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0]))

Training Details

Training Data

Dataset: HuggingFaceH4/Bespoke-Stratos-17k
Preprocessing: Token packing enabled (--packing), sequence length up to 2048 tokens

Training Procedure

Optimizer & Precision:
- bf16 mixed precision
- gradient_accumulation_steps = 8
- Gradient checkpointing enabled
Hyperparameters:
- Learning rate: 2e-5
- Epochs: 3
- Batch size: 4 (per device, both training and evaluation)
Evaluation & Saving:
- Evaluation every 500 steps
- Model checkpoint saved every 1000 steps, keeping a max of 2 checkpoints

Compute Infrastructure

Hardware Used: A100 GPU
Training Time: 6 hours

Evaluation

Benchmarks: No evaluation conducted yet.
Metrics: Not available yet.

Environmental Impact

Hardware Type: A100 GPU
Hours Used: 6 hours
Cloud Provider: [More Information Needed]
Compute Region: [More Information Needed]
Carbon Emitted: [More Information Needed]

Technical Specifications

Model Architecture

Similar to SmolLM2-360M
Inspired by MobileLLM
Uses Grouped-Query Attention (GQA)
Prioritizes depth over width

Citation [optional]

BibTeX:
[More Information Needed]

APA:
[More Information Needed]

More Information

[More Information Needed]

Model Card Authors [optional]

[More Information Needed]

Model Card Contact

[More Information Needed]