openPangu-Ultra-MoE-718B-V1.1-Int8在Omni-Infer部署指导文档

硬件环境和部署方式

部署方式为4P1D，需要8台Atlas 800T A3机器。4个P实例中，每个P实例对应1台A3机器，1个D实例由4台A3机器组成。

代码和镜像

Omni-Infer代码版本：v0.4.1
配套镜像：参考 https://gitee.com/omniai/omniinfer/releases 中v0.4.1镜像，以A3硬件和arm架构为例，使用“docker pull swr.cn-east-4.myhuaweicloud.com/omni/omni_infer-a3-arm:release_v0.4.1”。

拉起步骤

参考 https://gitee.com/omniai/omniinfer/blob/v0.4.1/tools/ansible/template/README.md，对于openPangu-Ultra-MoE-718B-V1.1-Int8拉起方式如下：

1. 环境准备

在执行机安装ansbile-playbook和sshpass，准备密钥文件

yum install ansible
# 执行机安装 ansible-playbook

yum install openssh-server
# 执行机安装 sshpass

ssh-keygen -t ed25519 -C "Your SSH key comment" -f ~/.ssh/my_key  # -t 指定密钥类型（推荐ed25519）， -f 指定文件名
# 在执行机生成密钥对
chmod 700 ~/.ssh
chmod 600 ~/.ssh/id_ed25519   # 私钥必须设为 600
chmod 644 ~/.ssh/id_ed25519.pub
# 设置密钥文件权限
ssh-copy-id -i ~/.ssh/id_ed25519.pub user@remote-host
# 部署公钥到远程目标机

2. 修改配置

修改omni_infer_inventory_used_for_4P1D.yml和omni_infer_server_template.yml

(1) tools/ansible/template/omni_infer_inventory_used_for_4P1D.yml

修改ansbile_host和host_ip。以4P1D为例：ansible_host即为机器对应的ip。host_ip的具体规则是，p节点的host_ip和ansbile_host保持一致；d节点的host_ip和d0的ansible_host保持一致。c0的ansible_host与p0的ansbile_host一般保持一致，即在同一台机器上。

(2) tools/ansible/template/omni_infer_server_template.yml

修改 MODEL_PATH、DOCKER_IMAGE_ID、 CODE_PATH、DOCKER_NAME_P、DOCKER_NAME_D和DOCKER_NAME_C。建议修改 LOG_PATH、LOG_PATH_IN_EXECUTOR、SCRIPTS_PATH 和 ranktable_save_path，防止路径下的文件被其他人覆盖。

(3) 除(2)外，openPangu-Ultra-MoE-718B-V1.1-Int8在tools/ansible/template/omni_infer_server_template.yml上的其它改动，如下所示：

MODEL_EXTRA_CFG_PATH="/workspace/omniinfer/tests/test_config/test_config_prefill_pangu_ultra_moe.json"
# 179行

EXTRA_ARGS='--max-num-batched-tokens 30000 --enforce-eager --enable-expert-parallel --disable-log-requests --max-num-seqs 16 --no-enable-prefix-caching --enable-reasoning --reasoning-parser pangu --enable-auto-tool-choice --tool-call-parser pangu'
# 180行

PROFILING_NAMELIST=/workspace/omniinfer/omni/adaptors/vllm/patches/profiler_patches/proc_bind/proc_marker_namelist.yml bash /workspace/omniinfer/tools/scripts/pd_run_pangu_ultra_moe.sh \
# 215行

MODEL_EXTRA_CFG_PATH="/workspace/omniinfer/tests/test_config/test_config_decode_pangu_ultra_moe.json"
# 266行

EXTRA_ARGS='--enable-expert-parallel --disable-log-requests --max-num-seqs 32 --no-enable-prefix-caching --enable-reasoning --reasoning-parser pangu --enable-auto-tool-choice --tool-call-parser pangu'
# 267行

PROFILING_NAMELIST=/workspace/omniinfer/omni/adaptors/vllm/patches/profiler_patches/proc_bind/proc_marker_namelist.yml bash /workspace/omniinfer/tools/scripts/pd_run_pangu_ultra_moe.sh \
# 283行

3. 执行命令

拉取对应代码，包括omniinfer和vllm，之后使用ansible执行，如下所示：

cd /data/local_code_path
git clone -b v0.4.1 https://gitee.com/omniai/omniinfer.git
cd omniinfer/infer_engines/
git clone https://github.com/vllm-project/vllm.git 或者 git clone https://gitee.com/mirrors/vllm.git
cd omniinfer/tools/ansible/template
# 拉取omniinfer和vllm代码并进入ansible文件路径

ansible-playbook -i omni_infer_inventory_used_for_4P1D.yml omni_infer_server_template.yml 
# 一键式拉起服务

ansible-playbook -i omni_infer_inventory_used_for_4P1D.yml omni_infer_server_template.yml --tags clean_up
# 一键式关闭服务并删除容器

4. 测试

在c0对应机器上测试（或使用c0的ip，端口默认7000）

curl --location 'http://0.0.0.0:7000/v1/chat/completions' --header 'Content-Type: application/json' --data '{
    "model": "pangu_ultra_moe",
    "messages": [{"role": "user", "content": "世界上有几个大洲？"}],
    "temperature": 0,
    "stream": false
}'