JRosenkranz commited on
Commit
f4c8757
1 Parent(s): 593ddda

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -0
README.md CHANGED
@@ -6,12 +6,22 @@ license: llama2
6
 
7
  This model as intended to be used as an accelerator for llama 13B (chat).
8
 
 
 
 
 
 
 
 
 
9
 
10
  Undlerlying implementation of Paged Attention KV-Cached and speculator can be found in https://github.com/foundation-model-stack/fms-extras
11
  Production implementation using `fms-extras` implementation can be found in https://github.com/tdoublep/text-generation-inference/tree/speculative-decoding
12
 
13
  ## Samples
14
 
 
 
15
  ### Production Server Sample
16
 
17
  *To try this out running in a production-like environment, please use the pre-built docker image:*
 
6
 
7
  This model as intended to be used as an accelerator for llama 13B (chat).
8
 
9
+ It takes inspiration from the Medusa architecture and modifies the MLP into a multi-stage MLP,
10
+ where each stage predicts a single token in the draft. Each stage takes as input both a state
11
+ vector and sampled token embedding from the prior stage (the base model can be considered
12
+ stage 0). The inputs are projected and passed through a LayerNorm/GeLU activation, forming a
13
+ new state vector. This state vector is used to predict the next draft token, which, with the new
14
+ state vector, acts as input for the next stage of prediction. We sample multiple tokens at each
15
+ stage, and emit a tree of candidate suffixes to evaluate in parallel.
16
+
17
 
18
  Undlerlying implementation of Paged Attention KV-Cached and speculator can be found in https://github.com/foundation-model-stack/fms-extras
19
  Production implementation using `fms-extras` implementation can be found in https://github.com/tdoublep/text-generation-inference/tree/speculative-decoding
20
 
21
  ## Samples
22
 
23
+ _Note: For all samples, your environment must have access to cuda_
24
+
25
  ### Production Server Sample
26
 
27
  *To try this out running in a production-like environment, please use the pre-built docker image:*