File size: 1,938 Bytes
7a1d06b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
---
language: en
tags:
- llama
- template-mlp
- parameter-efficient
- mlp-modification
datasets:
- none
license: apache-2.0
pipeline_tag: text-generation
library_name: transformers
---
# RECASTMLP-LLaMA
This model implements a parameter-efficient modification of the LLaMA architecture by replacing the standard MLP layers with template-based shared MLPs. The model maintains LLaMA's attention mechanism while reducing parameters in the feed-forward networks.
## Model Description
### Overview
RECASTMLP-LLaMA modifies the original LLaMA architecture by introducing template banks for MLP layers. Instead of having separate MLP weights for each transformer layer, it uses a shared set of template weights that are combined using learned coefficients.
### Architecture Details
- **Base Model:** LLaMA 3.1 8B
- **Number of Templates:** 4
- **Number of Groups:** 8
- **Coefficients per Template:** 1
- **Coefficients** 392
- **Hidden Size:** 4096
- **Intermediate Size:** 14336
- **Number of Attention Heads:** 32
- **Number of Key-Value Heads:** 8
- **Number of Layers:** 32
- **Max Position Embeddings:** 131072
- **Vocabulary Size:** 128256
### Key Features
1. **Template Banks:** Uses shared template weights across groups of layers
2. **Parameter Efficiency:** Reduces the total number of parameters by sharing MLP weights
3. **Group-wise Sharing:** Organizes layers into groups that share template banks
4. **Coefficient Learning:** Uses learned coefficients to combine template weights
## Usage
```python
from transformers import AutoModel, AutoTokenizer
# Load model and tokenizer
model = AutoModel.from_pretrained("appledora/RECASTMLP-llama3.1-f8t4", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8b")
# Prepare input
text = "Hello, how are you?"
inputs = tokenizer(text, return_tensors="pt")
# Generate output
outputs = model(**inputs)
hidden_states = outputs.last_hidden_state |