Add some clarification on what exactly this model is
Browse files
README.md
CHANGED
@@ -2,7 +2,7 @@
|
|
2 |
license: other
|
3 |
---
|
4 |
**What is it?**
|
5 |
-
Llama 2 13b expanded to the size of a Llama 1 33b model in certain areas, with the empty surrounding space filled with llama 33b data. (Base Model: https://huggingface.co/chargoddard/llama2-22b-blocktriangular) This is then finetuned on a 3090 by creating large loras and merging them. When I first started with 22b models, I looked for signs of knowledge transfer but didn't see it, so that's not a goal - the goal is just to throw lots of data at it until it adapts well to its surgically implanted parts.
|
6 |
|
7 |
|
8 |
|
|
|
2 |
license: other
|
3 |
---
|
4 |
**What is it?**
|
5 |
+
Llama 2 13b expanded to the size of a Llama 1 33b model in certain areas, with the empty surrounding space filled with llama 33b data. (Base Model: https://huggingface.co/chargoddard/llama2-22b-blocktriangular) This is then finetuned on a 3090 by creating large loras and merging them. When I first started with 22b models, I looked for signs of knowledge transfer but didn't see it, so that's not a goal - the goal is just to throw lots of data at it until it adapts well to its surgically implanted parts. Datasets used are a mix of instruction, roleplay, and conversational data, often curated.
|
6 |
|
7 |
|
8 |
|