Evaluating LLM outputs is often hard, since many tasks require open-ended answers for which no deterministic metrics work: for instance, when asking a model to summarize a text, there could be hundreds of correct ways to do it. The most versatile way to grade these outputs is then human evaluation, but it is very time-consuming, thus costly.
๐ค Then ๐๐ต๐ ๐ป๐ผ๐ ๐ฎ๐๐ธ ๐ฎ๐ป๐ผ๐๐ต๐ฒ๐ฟ ๐๐๐ ๐๐ผ ๐ฑ๐ผ ๐๐ต๐ฒ ๐ฒ๐๐ฎ๐น๐๐ฎ๐๐ถ๐ผ๐ป, by providing it relevant rating criteria? ๐ This is the idea behind LLM-as-a-judge.
โ๏ธ To implement a LLM judge correctly, you need a few tricks. โ So ๐'๐๐ฒ ๐ท๐๐๐ ๐ฝ๐๐ฏ๐น๐ถ๐๐ต๐ฒ๐ฑ ๐ฎ ๐ป๐ฒ๐ ๐ป๐ผ๐๐ฒ๐ฏ๐ผ๐ผ๐ธ ๐๐ต๐ผ๐๐ถ๐ป๐ด ๐ต๐ผ๐ ๐๐ผ ๐ถ๐บ๐ฝ๐น๐ฒ๐บ๐ฒ๐ป๐ ๐ถ๐ ๐ฝ๐ฟ๐ผ๐ฝ๐ฒ๐ฟ๐น๐ ๐ถ๐ป ๐ผ๐๐ฟ ๐๐๐ด๐ด๐ถ๐ป๐ด ๐๐ฎ๐ฐ๐ฒ ๐๐ผ๐ผ๐ธ๐ฏ๐ผ๐ผ๐ธ! (you can run it instantly in Google Colab) โก๏ธ ๐๐๐ -๐ฎ๐-๐ฎ-๐ท๐๐ฑ๐ด๐ฒ ๐ฐ๐ผ๐ผ๐ธ๐ฏ๐ผ๐ผ๐ธ: https://huggingface.co/learn/cookbook/llm_judge
The Cookbook is a great collection of notebooks demonstrating recipes (thus the "cookbook") for common LLM usages. I recommend you to go take a look! โก๏ธ ๐๐น๐น ๐ฐ๐ผ๐ผ๐ธ๐ฏ๐ผ๐ผ๐ธ๐: https://huggingface.co/learn/cookbook/index
DeepLearning.AI just announced a new short course: Open Source Models with Hugging Face ๐ค, taught by Hugging Face's own Maria Khalusova, Marc Sun and Younes Belkada!
As many of you already know, Hugging Face has been a game changer by letting developers quickly grab any of hundreds of thousands of already-trained open source models to assemble into new applications. This course teaches you best practices for building this way, including how to search and choose among models.
You'll learn to use the Transformers library and walk through multiple models for text, audio, and image processing, including zero-shot image segmentation, zero-shot audio classification, and speech recognition. You'll also learn to use multimodal models for visual question answering, image search, and image captioning. Finally, youโll learn how to demo what you build locally, on the cloud, or via an API using Gradio and Hugging Face Spaces.
Thank you very much to Hugging Face's wonderful team for working with us on this.