update blog link
Browse files
README.md
CHANGED
@@ -1,8 +1,7 @@
|
|
1 |
---
|
2 |
language:
|
3 |
- en
|
4 |
-
inference:
|
5 |
-
thumbnail: https://www.cerebras.net/wp-content/uploads/2022/05/Cerebras-Logo-Black.png
|
6 |
tags:
|
7 |
- pytorch
|
8 |
- causal-lm
|
@@ -16,7 +15,7 @@ pipeline_tag: text-generation
|
|
16 |
|
17 |
# BTLM-3B-8k-base
|
18 |
|
19 |
-
Bittensor Language Model (BTLM-3B-8k-base) is a 3 billion parameter language model with an 8k context length trained on 627B tokens of [SlimPajama](https://huggingface.co/datasets/cerebras/SlimPajama-627B). BTLM-3B-8k-base sets a new standard for 3B parameter models, outperforming models trained on hundreds of billions more tokens and achieving comparable performance to open 7B parameter models. BTLM-3B-8k-base can also be quantized to 4-bit to fit in devices with as little as 3GB of memory. The model is made available with an Apache 2.0 license for commercial use.
|
20 |
|
21 |
BTLM was trained by [Cerebras](https://www.cerebras.net/) in partnership with [Opentensor](https://opentensor.ai/) on the newly unveiled [Condor Galaxy 1 (CG-1) supercomputer](https://www.cerebras.net/blog/introducing-condor-galaxy-1-a-4-exaflop-supercomputer-for-generative-ai/), the first public deliverable of the G42-Cerebras strategic partnership.
|
22 |
|
@@ -128,7 +127,7 @@ Figure 4: Performance at 7B model size
|
|
128 |
- Optimizer: AdamW
|
129 |
- Positional Encoding: ALiBi
|
130 |
- Language: English
|
131 |
-
- Learn more:
|
132 |
- Paper: Coming soon
|
133 |
|
134 |
## To continue training with PyTorch and Maximal Update Parameterization
|
|
|
1 |
---
|
2 |
language:
|
3 |
- en
|
4 |
+
inference: true
|
|
|
5 |
tags:
|
6 |
- pytorch
|
7 |
- causal-lm
|
|
|
15 |
|
16 |
# BTLM-3B-8k-base
|
17 |
|
18 |
+
[Bittensor Language Model (BTLM-3B-8k-base)](https://www.cerebras.net/blog/btlm-3b-8k-7b-performance-in-a-3-billion-parameter-model/) is a 3 billion parameter language model with an 8k context length trained on 627B tokens of [SlimPajama](https://huggingface.co/datasets/cerebras/SlimPajama-627B). BTLM-3B-8k-base sets a new standard for 3B parameter models, outperforming models trained on hundreds of billions more tokens and achieving comparable performance to open 7B parameter models. BTLM-3B-8k-base can also be quantized to 4-bit to fit in devices with as little as 3GB of memory. The model is made available with an Apache 2.0 license for commercial use.
|
19 |
|
20 |
BTLM was trained by [Cerebras](https://www.cerebras.net/) in partnership with [Opentensor](https://opentensor.ai/) on the newly unveiled [Condor Galaxy 1 (CG-1) supercomputer](https://www.cerebras.net/blog/introducing-condor-galaxy-1-a-4-exaflop-supercomputer-for-generative-ai/), the first public deliverable of the G42-Cerebras strategic partnership.
|
21 |
|
|
|
127 |
- Optimizer: AdamW
|
128 |
- Positional Encoding: ALiBi
|
129 |
- Language: English
|
130 |
+
- Learn more: [BTLM-3B-8k blog post](https://www.cerebras.net/blog/btlm-3b-8k-7b-performance-in-a-3-billion-parameter-model/)
|
131 |
- Paper: Coming soon
|
132 |
|
133 |
## To continue training with PyTorch and Maximal Update Parameterization
|