view article Article Introducing multi-backends (TRT-LLM, vLLM) support for Text Generation Inference Jan 16 β’ 75
view article Article CPU Optimized Embeddings with π€ Optimum Intel and fastRAG Mar 15, 2024 β’ 11
view article Article ε©η¨ π€ Optimum Intel ε fastRAG ε¨ CPU δΈδΌεζζ¬ε΅ε ₯ Mar 15, 2024
view article Article ε©η¨ π€ Optimum Intel ε fastRAG ε¨ CPU δΈδΌεζζ¬ε΅ε ₯ Mar 15, 2024
view article Article ε©η¨ π€ Optimum Intel ε fastRAG ε¨ CPU δΈδΌεζζ¬ε΅ε ₯ Mar 15, 2024
view article Article ε©η¨ π€ Optimum Intel ε fastRAG ε¨ CPU δΈδΌεζζ¬ε΅ε ₯ Mar 15, 2024
view article Article ε©η¨ π€ Optimum Intel ε fastRAG ε¨ CPU δΈδΌεζζ¬ε΅ε ₯ Mar 15, 2024
view article Article Accelerating SD Turbo and SDXL Turbo Inference with ONNX Runtime and Olive Jan 15, 2024 β’ 7
view article Article AMD + π€: Large Language Models Out-of-the-Box Acceleration with AMD GPU Dec 5, 2023 β’ 4
view article Article Optimum-NVIDIA - Unlock blazingly fast LLM inference in just 1 line of code Dec 5, 2023 β’ 5
view article Article Accelerating over 130,000 Hugging Face models with ONNX Runtime Oct 4, 2023 β’ 1
view article Article Case Study: Millisecond Latency using Hugging Face Infinity and modern CPUs Jan 13, 2022 β’ 2
view article Article Introducing Optimum: The Optimization Toolkit for Transformers at Scale Sep 14, 2021 β’ 2