cerebras
/

btlm-3b-8k-base

Text Generation

Model card Files Files and versions Community

rskuzma commited on Jul 24, 2023

Commit

3e954b4

•

1 Parent(s): e2b82a1

update blog link

Files changed (1) hide show

README.md +3 -4

README.md CHANGED Viewed

@@ -1,8 +1,7 @@
 ---
 language:
 - en
-inference: false
-thumbnail: https://www.cerebras.net/wp-content/uploads/2022/05/Cerebras-Logo-Black.png
 tags:
 - pytorch
 - causal-lm
@@ -16,7 +15,7 @@ pipeline_tag: text-generation
 # BTLM-3B-8k-base
-Bittensor Language Model (BTLM-3B-8k-base) is a 3 billion parameter language model with an 8k context length trained on 627B tokens of [SlimPajama](https://huggingface.co/datasets/cerebras/SlimPajama-627B). BTLM-3B-8k-base sets a new standard for 3B parameter models, outperforming models trained on hundreds of billions more tokens and achieving comparable performance to open 7B parameter models. BTLM-3B-8k-base can also be quantized to 4-bit to fit in devices with as little as 3GB of memory. The model is made available with an Apache 2.0 license for commercial use.
 BTLM was trained by [Cerebras](https://www.cerebras.net/) in partnership with [Opentensor](https://opentensor.ai/) on the newly unveiled [Condor Galaxy 1 (CG-1) supercomputer](https://www.cerebras.net/blog/introducing-condor-galaxy-1-a-4-exaflop-supercomputer-for-generative-ai/), the first public deliverable of the G42-Cerebras strategic partnership.
@@ -128,7 +127,7 @@ Figure 4: Performance at 7B model size
 - Optimizer: AdamW
 - Positional Encoding: ALiBi
 - Language: English
-- Learn more: <TODO: link to blog>
 - Paper: Coming soon
 ## To continue training with PyTorch and Maximal Update Parameterization

 ---
 language:
 - en
+inference: true
 tags:
 - pytorch
 - causal-lm
 # BTLM-3B-8k-base
+[Bittensor Language Model (BTLM-3B-8k-base)](https://www.cerebras.net/blog/btlm-3b-8k-7b-performance-in-a-3-billion-parameter-model/) is a 3 billion parameter language model with an 8k context length trained on 627B tokens of [SlimPajama](https://huggingface.co/datasets/cerebras/SlimPajama-627B). BTLM-3B-8k-base sets a new standard for 3B parameter models, outperforming models trained on hundreds of billions more tokens and achieving comparable performance to open 7B parameter models. BTLM-3B-8k-base can also be quantized to 4-bit to fit in devices with as little as 3GB of memory. The model is made available with an Apache 2.0 license for commercial use.
 BTLM was trained by [Cerebras](https://www.cerebras.net/) in partnership with [Opentensor](https://opentensor.ai/) on the newly unveiled [Condor Galaxy 1 (CG-1) supercomputer](https://www.cerebras.net/blog/introducing-condor-galaxy-1-a-4-exaflop-supercomputer-for-generative-ai/), the first public deliverable of the G42-Cerebras strategic partnership.
 - Optimizer: AdamW
 - Positional Encoding: ALiBi
 - Language: English
+- Learn more: [BTLM-3B-8k blog post](https://www.cerebras.net/blog/btlm-3b-8k-7b-performance-in-a-3-billion-parameter-model/)
 - Paper: Coming soon
 ## To continue training with PyTorch and Maximal Update Parameterization