appledora
/

RECASTMLP-llama3.1-f8t4

Text Generation

recastmlp_llama

Model card Files Files and versions Community

RECASTMLP-llama3.1-f8t4 / model_card.md

appledora's picture

Upload 6 files

7a1d06b verified 24 days ago

|

1.94 kB

	---
	language: en
	tags:
	- llama
	- template-mlp
	- parameter-efficient
	- mlp-modification
	datasets:
	- none
	license: apache-2.0
	pipeline_tag: text-generation
	library_name: transformers
	---

	# RECASTMLP-LLaMA

	This model implements a parameter-efficient modification of the LLaMA architecture by replacing the standard MLP layers with template-based shared MLPs. The model maintains LLaMA's attention mechanism while reducing parameters in the feed-forward networks.

	## Model Description

	### Overview
	RECASTMLP-LLaMA modifies the original LLaMA architecture by introducing template banks for MLP layers. Instead of having separate MLP weights for each transformer layer, it uses a shared set of template weights that are combined using learned coefficients.

	### Architecture Details
	- Base Model: LLaMA 3.1 8B
	- Number of Templates: 4
	- Number of Groups: 8
	- Coefficients per Template: 1
	- Coefficients 392
	- Hidden Size: 4096
	- Intermediate Size: 14336
	- Number of Attention Heads: 32
	- Number of Key-Value Heads: 8
	- Number of Layers: 32
	- Max Position Embeddings: 131072
	- Vocabulary Size: 128256


	### Key Features
	1. Template Banks: Uses shared template weights across groups of layers
	2. Parameter Efficiency: Reduces the total number of parameters by sharing MLP weights
	3. Group-wise Sharing: Organizes layers into groups that share template banks
	4. Coefficient Learning: Uses learned coefficients to combine template weights

	## Usage

	```python
	from transformers import AutoModel, AutoTokenizer

	# Load model and tokenizer
	model = AutoModel.from_pretrained("appledora/RECASTMLP-llama3.1-f8t4", trust_remote_code=True)
	tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8b")

	# Prepare input
	text = "Hello, how are you?"
	inputs = tokenizer(text, return_tensors="pt")

	# Generate output
	outputs = model(**inputs)
	hidden_states = outputs.last_hidden_state