Update README.md
Browse files
README.md
CHANGED
@@ -11,11 +11,11 @@ fp16 merged weights can be found here: https://huggingface.co/bhenrym14/airoboro
|
|
11 |
|
12 |
## Overview
|
13 |
|
14 |
-
This is [Jon Durbin's Airoboros 13B GPT4 1.4](https://huggingface.co/jondurbin/airoboros-13b-gpt4-1.4) (
|
15 |
- Context length extended to 8192 by RoPE Scaled Embeddings, but NOT via the superHOT LoRA. I started with base Llama-13b.
|
16 |
- Training sequences beyond 2048 have the target truncated to equal 2048.
|
17 |
- Used airoboros-gpt4-1.4.1 dataset instead of airoboros-gpt4-1.4
|
18 |
- **This is a QLoRA fine-tune**. The original 13b model is a full fine-tune.
|
19 |
|
20 |
-
It was trained on 1x RTX 6000 Ada for ~
|
21 |
|
|
|
11 |
|
12 |
## Overview
|
13 |
|
14 |
+
This is [Jon Durbin's Airoboros 13B GPT4 1.4](https://huggingface.co/jondurbin/airoboros-13b-gpt4-1.4) (LoRA weights) with several key modifications:
|
15 |
- Context length extended to 8192 by RoPE Scaled Embeddings, but NOT via the superHOT LoRA. I started with base Llama-13b.
|
16 |
- Training sequences beyond 2048 have the target truncated to equal 2048.
|
17 |
- Used airoboros-gpt4-1.4.1 dataset instead of airoboros-gpt4-1.4
|
18 |
- **This is a QLoRA fine-tune**. The original 13b model is a full fine-tune.
|
19 |
|
20 |
+
It was trained on 1x RTX 6000 Ada for ~17 hours.
|
21 |
|