Feature requests and suggestions for V2
We are starting to work on V2 and would love to hear your suggestions and top requests!
Increase input sequences more than 2048 tokens
Can you train it also with Reinforcement Learning, like Open ai?
Sparse Upcycling might be cool to try! https://twitter.com/arankomatsuzaki/status/1602126140696629249?s=20&t=qnFaselW3mXcm-UZn7ISlA
Great work! Any timeline on when will V2 be available?
We are very interested in using GPT-JT for our BLIP-2 model: https://twitter.com/LiJunnan0409/status/1620259379223343107
From our current experiments, GPT-JT v1 outperforms OPT6.7B but still underperforms FLAN-T5
I have a rtx3090, how long should I expect for the model to load and respond if it's loaded locally? I was hoping the model would stay loaded like with Stable Diffusion so that I could continue to use it without having to reload it each time I call the program.
I have a rtx3090, how long should I expect for the model to load and respond if it's loaded locally? I was hoping the model would stay loaded like with Stable Diffusion so that I could continue to use it without having to reload it each time I call the program.
If you're using the from_pretrained
function to load the model locally, it typically takes around 2-3 minutes -- most of this time is spent on random initialization. And sure you can keep it loaded so that you don't have to reload it each time.
The inference response time will depend on your generation configuration, particularly the max_new_tokens
setting. Generally, the response time is linearly related to max_new_tokens
. For most configurations, the response time is typically several seconds at most; if your expected response is short, you can set a small value to accelerate inference.