Instructions to use openchat/openchat_8192 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use openchat/openchat_8192 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="openchat/openchat_8192")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("openchat/openchat_8192")
model = AutoModelForCausalLM.from_pretrained("openchat/openchat_8192")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use openchat/openchat_8192 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "openchat/openchat_8192"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "openchat/openchat_8192",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/openchat/openchat_8192

SGLang

How to use openchat/openchat_8192 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "openchat/openchat_8192" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "openchat/openchat_8192",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "openchat/openchat_8192" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "openchat/openchat_8192",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use openchat/openchat_8192 with Docker Model Runner:
```
docker model run hf.co/openchat/openchat_8192
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

OpenChat: Less is More for Open-source Models

OpenChat is a series of open-source language models fine-tuned on a diverse and high-quality dataset of multi-round conversations. With only ~6K GPT-4 conversations filtered from the ~90K ShareGPT conversations, OpenChat is designed to achieve high performance with limited data.

Generic models:

OpenChat: based on LLaMA-13B (2048 context length)
- 🚀 105.7% of ChatGPT score on Vicuna GPT-4 evaluation
- 🔥 80.9% Win-rate on AlpacaEval
- 🤗 Only used 6K data for finetuning!!!
OpenChat-8192: based on LLaMA-13B (extended to 8192 context length)
- 106.6% of ChatGPT score on Vicuna GPT-4 evaluation
- 79.5% of ChatGPT score on Vicuna GPT-4 evaluation

Code models:

OpenCoderPlus: based on StarCoderPlus (native 8192 context length)
- 102.5% of ChatGPT score on Vicuna GPT-4 evaluation
- 78.7% Win-rate on AlpacaEval

Note: Please load the pretrained models using bfloat16

Code and Inference Server

We provide the full source code, including an inference server compatible with the "ChatCompletions" API, in the OpenChat GitHub repository.

Web UI

OpenChat also includes a web UI for a better user experience. See the GitHub repository for instructions.

Conversation Template

The conversation template involves concatenating tokens.

Besides base model vocabulary, an end-of-turn token <|end_of_turn|> is added, with id eot_token_id.

# OpenChat
[bos_token_id] + tokenize("Human: ") + tokenize(user_question) + [eot_token_id] + tokenize("Assistant: ")
# OpenCoder
tokenize("User:") + tokenize(user_question) + [eot_token_id] + tokenize("Assistant:")

Hint: In BPE, tokenize(A) + tokenize(B) does not always equals to tokenize(A + B)

Following is the code for generating the conversation templates:

@dataclass
class ModelConfig:
    # Prompt
    system: Optional[str]

    role_prefix: dict
    ai_role: str
    eot_token: str
    bos_token: Optional[str] = None

    # Get template
    def generate_conversation_template(self, tokenize_fn, tokenize_special_fn, message_list):
        tokens = []
        masks = []

        # begin of sentence (bos)
        if self.bos_token:
            t = tokenize_special_fn(self.bos_token)
            tokens.append(t)
            masks.append(False)

        # System
        if self.system:
            t = tokenize_fn(self.system) + [tokenize_special_fn(self.eot_token)]
            tokens.extend(t)
            masks.extend([False] * len(t))

        # Messages
        for idx, message in enumerate(message_list):
            # Prefix
            t = tokenize_fn(self.role_prefix[message["from"]])
            tokens.extend(t)
            masks.extend([False] * len(t))

            # Message
            if "value" in message:
                t = tokenize_fn(message["value"]) + [tokenize_special_fn(self.eot_token)]
                tokens.extend(t)
                masks.extend([message["from"] == self.ai_role] * len(t))
            else:
                assert idx == len(message_list) - 1, "Empty message for completion must be on the last."

        return tokens, masks


MODEL_CONFIG_MAP = {
    # OpenChat / OpenChat-8192
    "openchat": ModelConfig(
        # Prompt
        system=None,

        role_prefix={
            "human": "Human: ",
            "gpt": "Assistant: "
        },
        ai_role="gpt",
        eot_token="<|end_of_turn|>",
        bos_token="<s>",
    ),

    # OpenCoder / OpenCoderPlus
    "opencoder": ModelConfig(
        # Prompt
        system=None,

        role_prefix={
            "human": "User:",
            "gpt": "Assistant:"
        },
        ai_role="gpt",
        eot_token="<|end_of_turn|>",
        bos_token=None,
    )
}

Downloads last month: 842

Model tree for openchat/openchat_8192

Quantizations

1 model

openchat
/

openchat_8192

OpenChat: Less is More for Open-source Models

Code and Inference Server

Web UI

Conversation Template

Model tree for openchat/openchat_8192

Spaces using openchat/openchat_8192 41