Onnx-version or Compatability to T5forConditionalGeneration
#1
by
michaelfeil
- opened
Looking forward to convert this model to a faster version for accelerated inference. (2B, 6B, 16B)
Options:
- Ctranslate2: Support for all architectures such as T5, mT5, GPT-J, GPT-2,.. As with codet5p-770m-py, this runs now at high speed and 1320MiB cuda footprint, batch inference which I think is awesome. https://huggingface.co/michaelfeil/ct2fast-codet5p-770m-py -> Any way to convert this to a T5 architecture?
- Onnx -> ORT or Nvidia TensorRT -> CodeT5pModuleConfig has no Onnx implementation, e.g. see Codegen2
Any advice?