baichuan-inc/Baichuan2-13B-Chat
This is the baichuan-inc/Baichuan2-13B-Chat model converted to OpenVINO with INT8 weights compression for accelerated inference.
An example of how to do inference on this model:
from optimum.intel import OVModelForCausalLM
from transformers import AutoTokenizer, pipeline
# model_id should be set to either a local directory or a model available on the HuggingFace hub.
model_id = "helenai/baichuan-inc-Baichuan2-13B-Chat-ov"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = OVModelForCausalLM.from_pretrained(model_id)
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
result = pipe("hello world")
print(result)
- Downloads last month
- 8
Inference API (serverless) does not yet support model repos that contain custom code.