OLMo-Bitnet-1B
OLMo-Bitnet-1B is a 1B parameter model trained using the method described in The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits.
It was trained on the first 60B tokens of the Dolma dataset, so it is merely a research proof-of-concept to test out the methodolgy.
A separate training run was run with the exact same hyperparameters, but using standard fp16 weights. The comparison can be found in this wandb report.
Sample inference code
pip install ai2-olmo
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline, TextStreamer
tokenizer = AutoTokenizer.from_pretrained("NousResearch/OLMo-Bitnet-1B")
model = AutoModelForCausalLM.from_pretrained("NousResearch/OLMo-Bitnet-1B",
torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto")
streamer = TextStreamer(tokenizer)
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, pad_token_id=tokenizer.eos_token_id,
temperature=0.8, repetition_penalty=1.1, do_sample=True,streamer=streamer)
pipe("The capitol of Paris is", max_new_tokens=256)
Training was performed using OLMo.
- Downloads last month
- 95
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.