Inference Endpoints Changelog 🚀
Week 46, Nov 11 - Nov 17
No changes this week as the team was on an off-site in Martinique! But a lot of ideas and energy cooked up for the coming week 🙌
Week 45, Nov 04 - Nov 10
This week, we have some awesome updates that are finally out 🙌
- Scaling replicas based on pending requests is now in beta 🔥 Since it's in beta, things might change, but you can try it out and read more about it here
- Improved analytics with a graph of the replica history
- Updates to the widgets
- Fixed bug in streaming
- Conversations can now be cleared
- Submit message with cmd+enter
Week 44, Oct 28 - Nov 03
Probably the biggest update this week was a revamp to the Inference Catalogue 🔥 You can now with a one-click-deploy find a model based on:
- license
- price range
- inference server
- accelerator
- and the previously existing task and search filters
Additionally:
- we fixed the config for
MoritzLaurer/deberta-v3-large-zeroshot-v2.0
so that you can run it on CPU as well - and also thanks to @ngxson for fixing a bug in the llama.cpp snippet
Week 43, Oct 21-27
This week you'll get a sneak peak of the upcoming autoscaling, in the form of analytics 👀
We have:
- Added pending http requests to the analytics
- Support for Image-Text-To-Text, aka language vision models 🔥 (llama vision has some good jokes 😅)
- Improved the log pagination and added some nice visual touches
- Fixed a bug related to total request count in the analytics
Week 42, Oct 14-20
This week was unfortunately slower on the user-facing updates.
Behind the scenes, we:
- fixed several recommendation values for LLaMA and Qwen 2,
- improved our internal analytics,
- debugged issues related to weights downloading and getting 429s,
- and hopefully squashed the last bugs so we can soon release the new autoscaling 🔥
Week 41, Oct 7-13
This week we had a lot of nice UI/UX improvements:
Additionally:
- deprecated the "text2text-generation" tasks, it's been deprecated on the Hub and in the Inference API as well
- you can now pass the "seed" parameter in the widget for diffuser models
- small bug fixes on llama.cpp containers
- you can directly play in the widget with openAI API parameters
- Shoutout to Alvaro for making the NVLM-D-72B model compatible on endpoints 🙌
On the backend we're also making improvements to the autoscaling. This might not immediately have noticeable impact for user but soon it'll ripple to the front end as well. Stay tuned 👀