Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Update app.py
Browse files
app.py
CHANGED
@@ -7,12 +7,17 @@ deepsparse.cpu.print_hardware_capability()
|
|
7 |
MODEL_ID = "hf:neuralmagic/Llama-2-7b-pruned70-retrained-ultrachat-quant-ds"
|
8 |
|
9 |
DESCRIPTION = f"""
|
10 |
-
#
|
11 |
-
The model stub for this example is: {MODEL_ID}
|
12 |
|
13 |
-
|
14 |
-
|
15 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
"""
|
17 |
|
18 |
MAX_MAX_NEW_TOKENS = 1024
|
|
|
7 |
MODEL_ID = "hf:neuralmagic/Llama-2-7b-pruned70-retrained-ultrachat-quant-ds"
|
8 |
|
9 |
DESCRIPTION = f"""
|
10 |
+
# Chat with an Efficient Sparse Llama 2 Model on CPU
|
|
|
11 |
|
12 |
+
This demo showcases a groundbreaking [sparse Llama 2 7B model](https://huggingface.co/neuralmagic/Llama-2-7b-pruned70-retrained-ultrachat-quant-ds) that has been pruned to 70% sparsity, retrained on pretraining data, and then sparse transferred for chat using the UltraChat 200k dataset. By leveraging the power of sparse transfer learning, this model delivers high-quality chat capabilities while significantly reducing computational costs and inference times.
|
13 |
+
|
14 |
+
### Under the Hood
|
15 |
+
|
16 |
+
- **Sparse Transfer Learning**: The model's pre-sparsified structure enables efficient fine-tuning on new tasks, minimizing the need for extensive hyperparameter tuning and reducing training times.
|
17 |
+
- **Accelerated Inference**: Powered by the [DeepSparse CPU inference runtime](https://github.com/neuralmagic/deepsparse), this model takes advantage of its inherent sparsity to provide lightning-fast token generation on CPUs.
|
18 |
+
- **Quantization**: 8-bit weight and activation quantization further optimizes the model's performance and memory footprint without compromising quality.
|
19 |
+
|
20 |
+
By combining state-of-the-art sparsity techniques with the robustness of the Llama 2 architecture, this model pushes the boundaries of efficient generation. Experience the future of AI-powered chat, where cutting-edge sparse models deliver exceptional performance on everyday hardware.
|
21 |
"""
|
22 |
|
23 |
MAX_MAX_NEW_TOKENS = 1024
|