OLMo-Bitnet-1B

OLMo-Bitnet-1B is a 1B parameter model trained using the method described in The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits.

It was trained on the first 60B tokens of the Dolma dataset, so it is merely a research proof-of-concept to test out the methodolgy.

A separate training run was run with the exact same hyperparameters, but using standard fp16 weights. The comparison can be found in this wandb report.

Sample inference code

pip install ai2-olmo

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline, TextStreamer

tokenizer = AutoTokenizer.from_pretrained("NousResearch/OLMo-Bitnet-1B")
model = AutoModelForCausalLM.from_pretrained("NousResearch/OLMo-Bitnet-1B",
    torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto")

streamer = TextStreamer(tokenizer)
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, pad_token_id=tokenizer.eos_token_id,
    temperature=0.8, repetition_penalty=1.1, do_sample=True,streamer=streamer)
pipe("The capitol of Paris is",  max_new_tokens=256)

Training was performed using OLMo.

NousResearch
/

OLMo-Bitnet-1B

OLMo-Bitnet-1B

Dataset used to train NousResearch/OLMo-Bitnet-1B

Spaces using NousResearch/OLMo-Bitnet-1B 3