|
--- |
|
language: en |
|
tags: |
|
- llama |
|
- template-mlp |
|
- parameter-efficient |
|
- mlp-modification |
|
datasets: |
|
- none |
|
license: apache-2.0 |
|
pipeline_tag: text-generation |
|
library_name: transformers |
|
--- |
|
|
|
# RECASTMLP-LLaMA |
|
|
|
This model implements a parameter-efficient modification of the LLaMA architecture by replacing the standard MLP layers with template-based shared MLPs. The model maintains LLaMA's attention mechanism while reducing parameters in the feed-forward networks. |
|
|
|
## Model Description |
|
|
|
### Overview |
|
RECASTMLP-LLaMA modifies the original LLaMA architecture by introducing template banks for MLP layers. Instead of having separate MLP weights for each transformer layer, it uses a shared set of template weights that are combined using learned coefficients. |
|
|
|
### Architecture Details |
|
- **Base Model:** LLaMA 3.1 8B |
|
- **Number of Templates:** 4 |
|
- **Number of Groups:** 8 |
|
- **Coefficients per Template:** 1 |
|
- **Coefficients** 392 |
|
- **Hidden Size:** 4096 |
|
- **Intermediate Size:** 14336 |
|
- **Number of Attention Heads:** 32 |
|
- **Number of Key-Value Heads:** 8 |
|
- **Number of Layers:** 32 |
|
- **Max Position Embeddings:** 131072 |
|
- **Vocabulary Size:** 128256 |
|
|
|
|
|
### Key Features |
|
1. **Template Banks:** Uses shared template weights across groups of layers |
|
2. **Parameter Efficiency:** Reduces the total number of parameters by sharing MLP weights |
|
3. **Group-wise Sharing:** Organizes layers into groups that share template banks |
|
4. **Coefficient Learning:** Uses learned coefficients to combine template weights |
|
|
|
## Usage |
|
|
|
```python |
|
from transformers import AutoModel, AutoTokenizer |
|
|
|
# Load model and tokenizer |
|
model = AutoModel.from_pretrained("appledora/RECASTMLP-llama3.1-f8t4", trust_remote_code=True) |
|
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8b") |
|
|
|
# Prepare input |
|
text = "Hello, how are you?" |
|
inputs = tokenizer(text, return_tensors="pt") |
|
|
|
# Generate output |
|
outputs = model(**inputs) |
|
hidden_states = outputs.last_hidden_state |