view article Article Introducing HELMET: Holistically Evaluating Long-context Language Models Apr 16 ⢠40
view article Article Speeding Up LLM Decoding with Advanced Universal Assisted Generation Techniques By jmamou and 8 others ⢠Mar 24 ⢠20
SQuARE: Sequential Question Answering Reasoning Engine for Enhanced Chain-of-Thought in Large Language Models Paper ⢠2502.09390 ⢠Published Feb 13 ⢠16
view article Article Assisted Generation: a new direction toward low-latency text generation May 11, 2023 ⢠74
view article Article Blazing Fast SetFit Inference with đ¤ Optimum Intel on Xeon Apr 3, 2024 ⢠11
RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation Paper ⢠2408.02545 ⢠Published Aug 5, 2024 ⢠39
view article Article Training and Finetuning Embedding Models with Sentence Transformers v3 May 28, 2024 ⢠260
Accelerating Speculative Decoding using Dynamic Speculation Length Paper ⢠2405.04304 ⢠Published May 7, 2024 ⢠2
Distributed Speculative Inference of Large Language Models Paper ⢠2405.14105 ⢠Published May 23, 2024 ⢠18
view article Article Building Cost-Efficient Enterprise RAG applications with Intel Gaudi 2 and Intel Xeon May 9, 2024 ⢠12
Improving Classification Performance With Human Feedback: Label a few, we label the rest Paper ⢠2401.09555 ⢠Published Jan 17, 2024 ⢠6
H_2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models Paper ⢠2306.14048 ⢠Published Jun 24, 2023 ⢠13