Text Generation
Transformers
PyTorch
English
gptj
Inference Endpoints

Feature requests and suggestions for V2

#4
by zhangce - opened
Together org

We are starting to work on V2 and would love to hear your suggestions and top requests!

Together org

Great suggestion from @espadrine on Twitter: FLAN support

This is top of our list for v2 right now!

Increase input sequences more than 2048 tokens

Can you train it also with Reinforcement Learning, like Open ai?

Great work! Any timeline on when will V2 be available?
We are very interested in using GPT-JT for our BLIP-2 model: https://twitter.com/LiJunnan0409/status/1620259379223343107
From our current experiments, GPT-JT v1 outperforms OPT6.7B but still underperforms FLAN-T5

Together org

@JunnanLi Thank you for your interest! Our team is actively testing larger models and more data. We will release new models in the near future, possibly within a couple of weeks. Keep an eye out for updates!

Hi @juewang any news on V2?

I have a rtx3090, how long should I expect for the model to load and respond if it's loaded locally? I was hoping the model would stay loaded like with Stable Diffusion so that I could continue to use it without having to reload it each time I call the program.

Together org

I have a rtx3090, how long should I expect for the model to load and respond if it's loaded locally? I was hoping the model would stay loaded like with Stable Diffusion so that I could continue to use it without having to reload it each time I call the program.

If you're using the from_pretrained function to load the model locally, it typically takes around 2-3 minutes -- most of this time is spent on random initialization. And sure you can keep it loaded so that you don't have to reload it each time.

The inference response time will depend on your generation configuration, particularly the max_new_tokens setting. Generally, the response time is linearly related to max_new_tokens. For most configurations, the response time is typically several seconds at most; if your expected response is short, you can set a small value to accelerate inference.

Sign up or log in to comment