File size: 2,040 Bytes
8e817cf d43ebac 8e817cf d43ebac 8e817cf d43ebac 9face7b d43ebac 9b55035 d43ebac |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
---
pipeline_tag: text-generation
tags:
- text-generation-inference
- backpack
- backpackmodel
library_name: transformers
license: apache-2.0
datasets:
- openwebtext
language:
- en
---
# Model Card for Levanter-Backpack-1.4B
This is 1.4B parameter version of [Backpack architecture](https://arxiv.org/abs/2305.16765), intended to combine strong modeling performance
with an interface for interpretability and control.
# Training Details
## Training Data
This model was trained on the [OpenWebText](https://huggingface.co/datasets/openwebtext) corpus.
## Training Procedure
This model was trained for 450k gradient steps and cosine decaying learning rate from 1e-4 to zero, with a linear warmup of 5k steps.
# Environmental Impact
- **Hardware Type:** v3-128 TPU (128 cores, 2TB Memory)
- **Hours used:** Roughly 8.6 days.
- **Cloud Provider:** Google Cloud Patform
- **Compute Region:** North America.
## Model Architecture and Objective
This model was trained to minimize the cross-entropy loss, and is a [Backpack language model](https://arxiv.org/pdf/2305.16765.pdf).
### Software
This model was trained with [Levanter](https://github.com/stanford-crfm/levanter/) and [Jax](https://github.com/google/jax).
### Loss Curve
![Loss Curve](assets/train_loss.png)
# How to Get Started with the Model
Please install `transformers`, `safetensors` and `torch` to use this model.
```bash
pip install transformers safetensors torch
```
Run the following Python code:
```python
import torch
import transformers
from transformers import AutoModelForCausalLM
model_id = "stanford-crfm/levanter-backpack-1b"
config = transformers.AutoConfig.from_pretrained(model_id, trust_remote_code=True)
torch_model = AutoModelForCausalLM.from_pretrained(
model_id,
config=config,
trust_remote_code=True
)
torch_model.eval()
input = torch.randint(0, 50264, (1, 512), dtype=torch.long)
torch_out = torch_model(input, position_ids=None,)
torch_out = torch.nn.functional.softmax(torch_out.logits, dim=-1)
print(torch_out.shape)
```
|