Model Card for `SmolLM-135M-Instruct-layer-width-pruned-90M-raw`

Model Details

Model Description

This model is a pruned version of HuggingFaceTB/SmolLM-135M-Instruct.
The pruning procedure reduced both layers and hidden dimensions, decreasing parameter count from 134M → ~93M (~30.5% reduction).

⚠️ Important Note:
This model has not been fine-tuned after pruning. Since layers and parts of weights were dropped, the model will not produce accurate outputs in its current state. To make it useful, one must apply distillation or fine-tuning.

Developed by: Independent modification (original model: HuggingFaceTB)
Model type: Causal Language Model (decoder-only, LLaMA architecture)
Language(s) (NLP): English (same as original SmolLM training corpus)
License: Inherits license from the original SmolLM-135M-Instruct
Finetuned from model: HuggingFaceTB/SmolLM-135M-Instruct

Model Sources

Repository: Original SmolLM
Paper [optional]: N/A
Demo [optional]: N/A

Uses

Direct Use

⚠️ Not suitable for inference out-of-the-box.
Intended for research in pruning, model compression, and architecture efficiency experiments.

Downstream Use

Can be fine-tuned or distilled on downstream NLP tasks (instruction following, summarization, dialogue, etc.) to regain performance.
Useful as a smaller backbone for constrained environments (edge devices, prototyping).

Out-of-Scope Use

Do not expect reliable outputs without fine-tuning.
Not suitable for production or safety-critical tasks.
Not intended for generating factual, unbiased, or safe text without retraining.

Bias, Risks, and Limitations

Risks: Outputs are nonsensical and misleading in current state.
Biases: Same biases as original SmolLM dataset, but pruning may amplify instability.
Limitations: Lower representational capacity due to fewer layers/hidden units → lower accuracy even after retraining.

Recommendations

Perform knowledge distillation from the original model onto this pruned version.
Apply fine-tuning for task-specific usage.
Do not use for real-world decision-making without retraining and evaluation.

How to Get Started with the Model

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "your-username/SmolLM-135M-Instruct-layer-width-pruned-90M-raw"
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM-135M-Instruct")
model = AutoModelForCausalLM.from_pretrained(model_name)

inputs = tokenizer("Hello world!", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0]))

⚠️ The outputs are not meaningful until fine-tuned.

Training Details

Training Data

Same as original SmolLM-135M-Instruct.
No new training performed after pruning.

Training Procedure

Step 1: Layer pruning → kept 25/30 layers.
Step 2: Hidden dimension pruning → hidden size 576 → 504; intermediate size 1536 → 1344.
No fine-tuning yet.

Training Hyperparameters

No training performed. Model is raw after pruning.

Evaluation

Testing Data, Factors & Metrics

No evaluation performed post-pruning.

Results

Model reduced from 134.5M → ~93.4M parameters.
~30.5% reduction in size.
Accuracy and output quality degraded (requires fine-tuning).

Environmental Impact

Minimal, since no retraining has been done yet. Only pruning + saving.

Hardware Type: Single GPU (pruning experiment)
Hours used: <1
Cloud Provider: N/A
Carbon Emitted: Negligible

Technical Specifications

Model Architecture and Objective

Based on LLaMA decoder-only transformer.
Objective: next-token prediction (causal LM).
Modified architecture:
- Layers: 30 → 25
- Hidden size: 576 → 504
- Intermediate size: 1536 → 1344
- Attention heads: 9 (unchanged)
- Key/Value heads: 3 (unchanged)

Compute Infrastructure

Hardware: Single consumer GPU (e.g., RTX series)
Software: PyTorch, Hugging Face Transformers 4.57.0

Downloads last month: 5

Safetensors

Model size

95M params

Tensor type

BF16

Model Card for SmolLM-135M-Instruct-layer-width-pruned-90M-raw