Model Card for SmolLM-135M-Instruct-layer-width-pruned-90M-raw
Model Details
Model Description
This model is a pruned version of HuggingFaceTB/SmolLM-135M-Instruct.
The pruning procedure reduced both layers and hidden dimensions, decreasing parameter count from 134M β ~93M (~30.5% reduction).
β οΈ Important Note:
This model has not been fine-tuned after pruning. Since layers and parts of weights were dropped, the model will not produce accurate outputs in its current state. To make it useful, one must apply distillation or fine-tuning.
- Developed by: Independent modification (original model: HuggingFaceTB)
- Model type: Causal Language Model (decoder-only, LLaMA architecture)
- Language(s) (NLP): English (same as original SmolLM training corpus)
- License: Inherits license from the original SmolLM-135M-Instruct
- Finetuned from model:
HuggingFaceTB/SmolLM-135M-Instruct
Model Sources
- Repository: Original SmolLM
- Paper [optional]: N/A
- Demo [optional]: N/A
Uses
Direct Use
- β οΈ Not suitable for inference out-of-the-box.
- Intended for research in pruning, model compression, and architecture efficiency experiments.
Downstream Use
- Can be fine-tuned or distilled on downstream NLP tasks (instruction following, summarization, dialogue, etc.) to regain performance.
- Useful as a smaller backbone for constrained environments (edge devices, prototyping).
Out-of-Scope Use
- Do not expect reliable outputs without fine-tuning.
- Not suitable for production or safety-critical tasks.
- Not intended for generating factual, unbiased, or safe text without retraining.
Bias, Risks, and Limitations
- Risks: Outputs are nonsensical and misleading in current state.
- Biases: Same biases as original SmolLM dataset, but pruning may amplify instability.
- Limitations: Lower representational capacity due to fewer layers/hidden units β lower accuracy even after retraining.
Recommendations
- Perform knowledge distillation from the original model onto this pruned version.
- Apply fine-tuning for task-specific usage.
- Do not use for real-world decision-making without retraining and evaluation.
How to Get Started with the Model
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "your-username/SmolLM-135M-Instruct-layer-width-pruned-90M-raw"
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM-135M-Instruct")
model = AutoModelForCausalLM.from_pretrained(model_name)
inputs = tokenizer("Hello world!", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0]))
β οΈ The outputs are not meaningful until fine-tuned.
Training Details
Training Data
- Same as original SmolLM-135M-Instruct.
- No new training performed after pruning.
Training Procedure
- Step 1: Layer pruning β kept 25/30 layers.
- Step 2: Hidden dimension pruning β hidden size 576 β 504; intermediate size 1536 β 1344.
- No fine-tuning yet.
Training Hyperparameters
- No training performed. Model is raw after pruning.
Evaluation
Testing Data, Factors & Metrics
- No evaluation performed post-pruning.
Results
- Model reduced from 134.5M β ~93.4M parameters.
- ~30.5% reduction in size.
- Accuracy and output quality degraded (requires fine-tuning).
Environmental Impact
Minimal, since no retraining has been done yet. Only pruning + saving.
- Hardware Type: Single GPU (pruning experiment)
- Hours used: <1
- Cloud Provider: N/A
- Carbon Emitted: Negligible
Technical Specifications
Model Architecture and Objective
- Based on LLaMA decoder-only transformer.
- Objective: next-token prediction (causal LM).
- Modified architecture:
- Layers: 30 β 25
- Hidden size: 576 β 504
- Intermediate size: 1536 β 1344
- Attention heads: 9 (unchanged)
- Key/Value heads: 3 (unchanged)
Compute Infrastructure
- Hardware: Single consumer GPU (e.g., RTX series)
- Software: PyTorch, Hugging Face Transformers 4.57.0
- Downloads last month
- 5