license: other
library_name: transformers
This model is made with the intention to be used for fine-tuning. It should not to be used for inference as is. This is a pruned version of Meta-Llama-3-70B-Instruct .
Meta-Llama-3-70B-Instruct has 70.6 billion params and Drobeta-Turnu-Severin has 44.9 billion (~63% param size)
Steps to replicate:
Use laserQlora.ipynb from cognitivecomputations/laserRMT to determine which layers should be eliminated.
Adapt the script for Meta-Llama-3-70B-Instruct
by replacing model_name = "mistralai/Mistral-7B-v0.1"
with model_name = "Meta-Llama-3-70B-Instruct"
and layer_numbers = list(range(31, -1, -1))
with layer_numbers = list(range(79, -1, -1))
, 79 being the last recurrent layer index Meta-Llama-3-70B-Instruct has.
Then look for the layer indexes where self_attn.v_proj snr is Infinity and eliminate those layers using mergekit. Here are the layer indexes that were eliminated: 11,17,37,40,41,42,43,44,45,46,48,49,50,51,53,54,55,57,58,59,60,61,62,63,64,65,66,67,68,69 .
Here is the mergekit config:
slices:
- sources:
- model: "meta-llama/Meta-Llama-3-70B-Instruct"
layer_range: [0, 11]
- sources:
- model: "meta-llama/Meta-Llama-3-70B-Instruct"
layer_range: [12, 17]
- sources:
- model: "meta-llama/Meta-Llama-3-70B-Instruct"
layer_range: [18, 37]
- sources:
- model: "meta-llama/Meta-Llama-3-70B-Instruct"
layer_range: [38, 40]
- sources:
- model: "meta-llama/Meta-Llama-3-70B-Instruct"
layer_range: [47, 48]
- sources:
- model: "meta-llama/Meta-Llama-3-70B-Instruct"
layer_range: [52, 53]
- sources:
- model: "meta-llama/Meta-Llama-3-70B-Instruct"
layer_range: [56, 57]
- sources:
- model: "meta-llama/Meta-Llama-3-70B-Instruct"
layer_range: [70, 80]
merge_method: passthrough
dtype: bfloat16