stanford-crfm
/

levanter-backpack-1b

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

levanter-backpack-1b / README.md

ivanzhouyq's picture

update model

9face7b over 1 year ago

|

history blame contribute delete

2.04 kB

	---
	pipeline_tag: text-generation
	tags:
	- text-generation-inference
	- backpack
	- backpackmodel
	library_name: transformers
	license: apache-2.0
	datasets:
	- openwebtext
	language:
	- en
	---

	# Model Card for Levanter-Backpack-1.4B
	This is 1.4B parameter version of [Backpack architecture](https://arxiv.org/abs/2305.16765), intended to combine strong modeling performance
	with an interface for interpretability and control.

	# Training Details

	## Training Data
	This model was trained on the [OpenWebText](https://huggingface.co/datasets/openwebtext) corpus.
	## Training Procedure

	This model was trained for 450k gradient steps and cosine decaying learning rate from 1e-4 to zero, with a linear warmup of 5k steps.

	# Environmental Impact

	- Hardware Type: v3-128 TPU (128 cores, 2TB Memory)
	- Hours used: Roughly 8.6 days.
	- Cloud Provider: Google Cloud Patform
	- Compute Region: North America.

	## Model Architecture and Objective

	This model was trained to minimize the cross-entropy loss, and is a [Backpack language model](https://arxiv.org/pdf/2305.16765.pdf).

	### Software

	This model was trained with [Levanter](https://github.com/stanford-crfm/levanter/) and [Jax](https://github.com/google/jax).

	### Loss Curve
	![Loss Curve](assets/train_loss.png)

	# How to Get Started with the Model

	Please install `transformers`, `safetensors` and `torch` to use this model.

	```bash
	pip install transformers safetensors torch
	```

	Run the following Python code:

	```python
	import torch
	import transformers
	from transformers import AutoModelForCausalLM


	model_id = "stanford-crfm/levanter-backpack-1b"
	config = transformers.AutoConfig.from_pretrained(model_id, trust_remote_code=True)
	torch_model = AutoModelForCausalLM.from_pretrained(
	model_id,
	config=config,
	trust_remote_code=True
	)
	torch_model.eval()

	input = torch.randint(0, 50264, (1, 512), dtype=torch.long)
	torch_out = torch_model(input, position_ids=None,)
	torch_out = torch.nn.functional.softmax(torch_out.logits, dim=-1)
	print(torch_out.shape)
	```