JRosenkranz
commited on
Commit
•
67582ed
1
Parent(s):
39a3beb
Update README.md
Browse files
README.md
CHANGED
@@ -11,10 +11,14 @@ from the prior stage (the base model can be considered stage 0).
|
|
11 |
The state vector from the base model provides contextual information to the accelerator,
|
12 |
while conditioning on prior sampled tokens allows it to produce higher-quality draft n-grams.
|
13 |
|
|
|
|
|
|
|
14 |
## Repository Links
|
15 |
|
16 |
1. [Paged Attention KV-Cache / Speculator Implementations](https://github.com/foundation-model-stack/fms-extras)
|
17 |
2. [Production Server with speculative decoding implementation](https://github.com/tdoublep/text-generation-inference/tree/speculative-decoding)
|
|
|
18 |
|
19 |
## Samples
|
20 |
|
|
|
11 |
The state vector from the base model provides contextual information to the accelerator,
|
12 |
while conditioning on prior sampled tokens allows it to produce higher-quality draft n-grams.
|
13 |
|
14 |
+
Note: The underlying speculative model (untrained) is a generic model that could be used with any generative model to accelerate inference. Training
|
15 |
+
is quite light-weight and may only require a few days to be fully pre-trained.
|
16 |
+
|
17 |
## Repository Links
|
18 |
|
19 |
1. [Paged Attention KV-Cache / Speculator Implementations](https://github.com/foundation-model-stack/fms-extras)
|
20 |
2. [Production Server with speculative decoding implementation](https://github.com/tdoublep/text-generation-inference/tree/speculative-decoding)
|
21 |
+
3. [Speculator training](https://github.com/foundation-model-stack/fms-fsdp)
|
22 |
|
23 |
## Samples
|
24 |
|