CALM-70B: Conversational Agentic Language Model

Model Description

CALM-70B is our middle scale Conversational Agentic Language Model, designed to integrate Task-Oriented Dialogue (TOD) capabilities with Language Agent (LA) functionalities at a larger scale than its predecessor CALM-8B. By leveraging CALM-IT, a multi-task dataset interleaving multi-turn ReAct reasoning with complex API usage, CALM-70B achieves state-of-the-art performance across TOD and function-calling benchmarks.

CALM-70B has been fine-tuned on a comprehensive multi-tasking covering dialogue state tracking, function calling, and multi-turn reasoning, surpassing even proprietary models like GPT-4o on major conversational evaluation benchmarks: MultiWOZ 2.4 (TOD), BFCL V3 (LA), and API-Bank (LA).

Model Sources [optional]

  • Paper [optional]: [More Information Needed]
  • Repository: [More Information Needed]

Model Details

  • Model Name: CALM-70B
  • Developed by: Colloboration of UIUC Conversational AI LAB and Oumi
  • License: Apache 2.0
  • Architecture: Fine-tuned Llama 3.3 70B Instruct
  • Parameter Count: 70B
  • Training Data: CALM-IT
  • Training Type: Full Fine-tunning (FFT)
  • Fine-tuning Framework: Oumi
  • Training Hardware: 8 NVIDIA H100 GPUs
  • Training Duration: ~24 hours
  • Evaluation Benchmarks: MultiWOZ 2.4, BFCL V3, API-Bank
  • Release Date: February 5, 2025

Capabilities and Features

πŸ—£ Conversational Agentic Abilities

  • Multi-turn Dialogue Mastery: Handles long conversations with accurate state tracking.
  • Advanced Function Calling: Dynamically selects and executes API calls for task completion.
  • Enhanced ReAct-based Reasoning: Integrates structured reasoning (User-Thought-Action-Observation-Thought-Response).
  • Zero-Shot Generalization: Excels in unseen function-calling and TOD tasks.

πŸš€ Benchmark Performance

  • MultiWOZ 2.4 (TOD): Strong performance in dialogue state tracking and task success.
  • BFCL V3 (LA): Superior function-calling abilities compared to language agents.
  • API-Bank (LA): High accuracy in API call generation and response synthesis.

Training Process

πŸ”§ Fine-tuning Stages

  1. TOD Fine-tuning: Optimized for dialogue state tracking (e.g., augmented SNIPS in instruction-tuned format).
  2. Function Calling Fine-tuning: Trained to generate precise API calls from LA datasets.
  3. ReAct-based Fine-tuning: Enhances multi-turn conversations with API integrations through structured reasoning.

πŸ” Training Hyperparameters

  • Base Model: Llama 3.3 70B Instruct
  • LoRA Config: Rank = 16, Scaling Factor = 32
  • Batch Size: 7
  • Learning Rate: 4e-5
  • Optimizer: AdamW (betas = 0.9, 0.999, epsilon = 1e-8)
  • Precision: Mixed precision (bfloat16)
  • Warm-up Steps: 24
  • Gradient Accumulation Steps: 1

Usage

πŸ— How to Load the Model

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("uiuc-convai/CALM-70B")
model = AutoModelForCausalLM.from_pretrained("uiuc-convai/CALM-70B")

πŸ›  Example Inference

TODO

  • Scalability to CALM-405B: Next iteration will extend capabilities for even larger-scale conversations.
  • Continuous Open-Source Expansion: Ongoing release of datasets, model weights, and training artifacts to foster community research.

Citation

If you use CALM-70B in your research, please cite:

@article{yourpaper2024,
  title={CALM: Conversational Agentic Language Model},
  author={Your Name and Collaborators},
  journal={Your Conference/Journal},
  year={2024}
}

For more details, visit Project Repository or contact acikgoz2@illinois.edu.

Downloads last month
55
Safetensors
Model size
70.6B params
Tensor type
BF16
Β·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for uiuc-convai/CALM-70B

Finetuned
(120)
this model

Collection including uiuc-convai/CALM-70B