xDAN-AI
/

APUS-xDAN-4.0-MOE

Inference Endpoints

Model card Files Files and versions Community

xDAN2099 commited on Apr 2, 2024

Commit

29fde44

·

verified ·

1 Parent(s): c5dbeba

Update README.md

Files changed (1) hide show

README.md +6 -4

README.md CHANGED Viewed

@@ -2,7 +2,9 @@
 license: apache-2.0
 ---
-Introduction
 APUS-xDAN-4.0-MOE is a transformer-based decoder-only language model, developed on a vast corpus of data to ensure robust performance.
@@ -17,7 +19,7 @@ APUS-xDAN-4.0-MOE leverages the innovative Mixture of Experts (MoE) architecture
 Through advanced quantization techniques, our open-source version occupies a mere 42GB, making it seamlessly compatible with consumer-grade GPUs like the 4090 and 3090.
 The following specifications:
-- **Parameters:** 134B
 - **Architecture:** Mixture of 4 Experts (MoE)
 - **Experts Utilization:** 2 experts used per token
 - **Layers:** 60
@@ -26,7 +28,7 @@ The following specifications:
 - **Additional Features:**
   - Rotary embeddings (RoPE)
   - Supports activation sharding and 1.5bit~4bit quantization
-- **Maximum Sequence Length (context):** 32,768 tokens
 ## Usage
 ### Initial
@@ -38,7 +40,7 @@ make LLAMA_CUDA=1
 ### Interactive Chat
 ```python
-./main -m xDAN-L2-moe-4x34b-v4-0326.IQ3_S.gguf \
 --prompt "You are a helpful assistant." --chatml \
 --interactive \
 --temp 0.7 \

 license: apache-2.0
 ---
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/643197ac288c9775673a01e9/w-lgOpASM1DMl2PO0kdFy.png)
+## Introduction
 APUS-xDAN-4.0-MOE is a transformer-based decoder-only language model, developed on a vast corpus of data to ensure robust performance.
 Through advanced quantization techniques, our open-source version occupies a mere 42GB, making it seamlessly compatible with consumer-grade GPUs like the 4090 and 3090.
 The following specifications:
+- **Parameters:** 136B
 - **Architecture:** Mixture of 4 Experts (MoE)
 - **Experts Utilization:** 2 experts used per token
 - **Layers:** 60
 - **Additional Features:**
   - Rotary embeddings (RoPE)
   - Supports activation sharding and 1.5bit~4bit quantization
+- **Maximum Sequence Length (context):** 32,768 tokens
 ## Usage
 ### Initial
 ### Interactive Chat
 ```python
+./main -m APUS-xDAN-4.0-quanzied_version.gguf \
 --prompt "You are a helpful assistant." --chatml \
 --interactive \
 --temp 0.7 \