Why use the InferenceClient?
π Seamless transition: keep your existing code structure while leveraging LLMs hosted on the Hugging Face Hub.
π€ Direct integration: easily launch a model to run inference using our Inference Endpoint service.
π Stay Updated: always be in sync with the latest Text-Generation-Inference (TGI) updates.
More details in https://huggingface.co/docs/huggingface_hub/main/en/guides/inference#openai-compatibility