We are introducing multi-backend support in Hugging Face Text Generation Inference! With new TGI architecture we are now able to plug new modeling backends to get best performances according to selected model and available hardware. This first step will very soon be followed by the integration of new backends (TRT-LLM, llama.cpp, vLLM, Neuron and TPU).
We are polishing the TensorRT-LLM backend which achieves impressive performances on NVIDIA GPUs, stay tuned π€ !
β Today weβre releasing The Stack v2 & StarCoder2: a series of 3B, 7B & 15B code generation models trained on 3.3 to 4.5 trillion tokens of code:
- StarCoder2-15B matches or outperforms CodeLlama 34B, and approaches DeepSeek-33B on multiple benchmarks. - StarCoder2-3B outperforms StarCoderBase-15B and similar sized models. - The Stack v2 a 4x larger dataset than the Stack v1, resulting in 900B unique code tokens π As always, we released everything from models and datasets to curation code. Enjoy!