File size: 1,719 Bytes

bffc5ed
4fc7faa
 
 
 
 
bffc5ed
4fc7faa
bffc5ed
cb13c9b
4fc7faa
 
bffc5ed
4fc7faa
 
bffc5ed
 
5a26c8b
 
 
 
4fc7faa
bffc5ed
4fc7faa
 
 
 
 
 
 
 
bffc5ed
 
 
4fc7faa

---
license: other
language:
- ja
base_model:
- deepseek-ai/DeepSeek-V3
---
# DeepSeek-V3-slice-jp64

## 実験モデルです
本モデルは [DeepSeek-V3](https://huggingface.co/deepseek-ai/DeepSeek-V3) をベースに、日本語の例文を元に頻出する MoE (Mixture of Experts) の各レイヤーごとのexpertsを厳選して再構成したモデルです。
元のモデルでは 256 のexpertsを搭載していますが、日本語出力における安定性とパフォーマンスのバランスを重視し、各層で頻出する 64 のexpertsを使用するように調整しています。

### 例文出力時の各layerごとのexpertsの頻出分布
![](layer_topk_idx_distribution_bubble.png)
---

## ライセンス
ご使用前にライセンスファイルをご確認ください。  
[DeepSeek-V3](https://huggingface.co/deepseek-ai/DeepSeek-V3) こちらのライセンスをそのまま使用しています。  

## 特徴

- MoEモデルのexpertsから、日本語の例文出力をして各layerごとに頻出する64のexpertをして組み直したモデルです。
- 16ではまともに動かず、32では安定しなかったため64expertsにしています。
- scripts/layer_topk_idx_distribution.json
    - 各layerごとに頻出順に128のexpertのrankが記録されています。
- scripts/deepseek_slice.py
    - 元モデル（bf16）から、64のexpertを使用したモデル（bf16）を作成します。
- scripts/model_test.py
    - モデル実行用テスト用のスクリプトです。コメントアウトされている例文を元に頻出するexpertを計測しています

---

## 使い方
`scripts/model_test.py`に実行コードあります