Post
It feels awkward having my first post sharing my stuff, but this is a weekend project that I really enjoyed working on. I'd love to meet more people interested in random ideas like this.
A hard part of building AI applications is choosing which model to use. What if we don’t have to? What if we can predict the best model for any prompt?
Predictive human preference aims to predict which model users might prefer for a specific query.
https://huyenchip.com/2024/02/28/predictive-human-preference.html
One use case is model routing. If we know in advance that for a prompt, users will prefer Claude Instant’s response over GPT-4, and Claude Instant is cheaper/faster than GPT-4, we can route this prompt to Claude Instant. Model routing has the potential to increase response quality while reducing costs and latency.
One pattern is that for simple prompts, weak models can do (nearly) as well as strong models. For more challenging prompts, however, users are more likely to prefer stronger models. Here’s a visualization of predicted human preference for an easy prompt (“hello, how are you?”) and a challenging prompt (“Explain why Planc length …”).
Preference predictors make it possible to create leaderboards unique to any prompt and domain.
A hard part of building AI applications is choosing which model to use. What if we don’t have to? What if we can predict the best model for any prompt?
Predictive human preference aims to predict which model users might prefer for a specific query.
https://huyenchip.com/2024/02/28/predictive-human-preference.html
One use case is model routing. If we know in advance that for a prompt, users will prefer Claude Instant’s response over GPT-4, and Claude Instant is cheaper/faster than GPT-4, we can route this prompt to Claude Instant. Model routing has the potential to increase response quality while reducing costs and latency.
One pattern is that for simple prompts, weak models can do (nearly) as well as strong models. For more challenging prompts, however, users are more likely to prefer stronger models. Here’s a visualization of predicted human preference for an easy prompt (“hello, how are you?”) and a challenging prompt (“Explain why Planc length …”).
Preference predictors make it possible to create leaderboards unique to any prompt and domain.