ibm-fms
/

llama-13b-accelerator

Inference Endpoints

Model card Files Files and versions Community

JRosenkranz commited on Apr 5

Commit

a24b598

•

1 Parent(s): 2532bba

added minimal description

Files changed (1) hide show

README.md +9 -0

README.md CHANGED Viewed

@@ -2,6 +2,15 @@
 license: llama2
 ---
 To try this out running in a production-like environment, please use the pre-built docker image:
 ```bash

 license: llama2
 ---
+## Description
+This model as intended to be used as an accelerator for llama 13B (chat).
+Undlerlying implementation of Paged Attention KV-Cached and speculator can be found in https://github.com/foundation-model-stack/fms-extras
+Production implementation using `fms-extras` implementation can be found in https://github.com/tdoublep/text-generation-inference/tree/speculative-decoding
 To try this out running in a production-like environment, please use the pre-built docker image:
 ```bash