Continuous Autoregressive Language Models

Paper GitHub HuggingFace Project Page

Model Description

Modern Large Language Models (LLMs) are constrained by a fundamental bottleneck: they generate text one token at a time. CALM (Continuous Autoregressive Language Models) confronts this challenge by introducing a paradigm shift in language modeling. Instead of predicting one discrete token at a time, CALM learns to predict a single continuous vector that represents an entire chunk of K tokens.

This is achieved through a two-stage process:

  1. A high-fidelity autoencoder learns to compress K tokens into a single vector and reconstruct them with near-perfect accuracy.
  2. A continuous-domain language model then performs autoregressive prediction in this vector space.

Key Features

  • 🚀 Ultra-Efficient by Design: Dramatically improves training and inference efficiency by reducing the number of autoregressive steps by a factor of K.
  • 💡 A New Scaling Axis: Introduces a new scaling dimension for LLMs—semantic bandwidth (K). Instead of just scaling parameters and data, you can now scale the amount of information processed in a single step.
  • 🛠️ A Comprehensive Likelihood-Free Toolkit: Operating in a continuous domain requires new tools. This repository provides the full suite of algorithms that make CALM possible:
    • A Robust Autoencoder to learn high-fidelity continuous representations of token chunks.
    • Energy-Based Training, a principled and likelihood-free method for generative modeling.
    • BrierLM, a new metric for calibrated, likelihood-free evaluation of language models.
    • Temperature Sampling for controlled, high-quality text generation using only a black-box sampler.

How to use

We provide scripts for training and evaluation in our GitHub README.

Sample Usage (Text Generation)

You can explore the core implementation of CALM in the GitHub repository. We've made it easy to use CALM by including our custom code in the 🤗Hugging Face model zoo. Simply set trust_remote_code=True when loading the models through the Transformers library.

from transformers import pipeline, AutoTokenizer
import torch

model_name = "cccczshao/CALM-M" # Example model from the collection
pipe = pipeline(
    "text-generation",
    model_name,
    tokenizer=AutoTokenizer.from_pretrained(model_name),
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
print(pipe("The key to life is", max_new_tokens=20, do_sample=True)[0]["generated_text"])

Contact

If you have any questions, feel free to submit an issue or contact chenzeshao@tencent.com.

Downloads last month
98
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train cccczshao/CALM-M

Collection including cccczshao/CALM-M