Instructions to use RepublicOfKorokke/GLM-4.7-Flash-oQ3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use RepublicOfKorokke/GLM-4.7-Flash-oQ3 with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("RepublicOfKorokke/GLM-4.7-Flash-oQ3")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps
LM Studio

Pi new

How to use RepublicOfKorokke/GLM-4.7-Flash-oQ3 with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "RepublicOfKorokke/GLM-4.7-Flash-oQ3"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "RepublicOfKorokke/GLM-4.7-Flash-oQ3"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use RepublicOfKorokke/GLM-4.7-Flash-oQ3 with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "RepublicOfKorokke/GLM-4.7-Flash-oQ3"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default RepublicOfKorokke/GLM-4.7-Flash-oQ3

Run Hermes

hermes

MLX LM

How to use RepublicOfKorokke/GLM-4.7-Flash-oQ3 with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "RepublicOfKorokke/GLM-4.7-Flash-oQ3"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "RepublicOfKorokke/GLM-4.7-Flash-oQ3"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "RepublicOfKorokke/GLM-4.7-Flash-oQ3",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

GLM-4.7-Flash-oQ3

This model was quantized using oQ mixed-precision quantization.

Quantization details

Model type: glm4_moe_lite
Bits: 3
Group size: 64
Format: MLX safetensors

Benchmark

Model	File size	MMLU	JMMLU	HELLASWAG	ARC_CHALLENGE	GSM8K
GLM-4.7-Flash-MLX-6bit	22.68 GB	71.3%	63.3%	69.0%	86.0%	89.3%
GLM-4.7-Flash-oQ3	12.93 GB	63.7%	56.3%	62.0%	80.3%	88.0%
GLM-4.7-Flash-oQ3.5	14.00 GB	63.7%	56.7%	59.3%	78.7%	84.0%
GLM-4.7-Flash-oQ4	16.4 GB	71.0%	60.0%	62.0%	84.3%	92.0%
GLM-4.7-Flash-REAP-23B-A3B-6bit	17.43 GB	62.3%	46.0%	-	-	-
GLM-4.7-Flash-REAP-23B-A3B-oQ3	9.91 GB	53.3%	38.3%	47.7%	73.3%	73.3%
GLM-4.7-Flash-REAP-23B-A3B-oQ3.5	10.62 GB	57.7%	49.3%	-	-	-
GLM-4.7-Flash-REAP-23B-A3B-oQ4	12.51 GB	59.3%	43.0%	53.3%	78.7%	87.7%
GLM-4.7-Flash-REAP-23B-A3B-oQ5	15.21 GB	61.0%	45.3%	59.0%	81.0%	90.0%

Detail

Model	Benchmark	Accuracy	Correct	Total	Time(s)
GLM-4.7-Flash-MLX-6bit	MMLU	71.3%	214	300	533.4
GLM-4.7-Flash-MLX-6bit	JMMLU	63.3%	190	300	260.3
GLM-4.7-Flash-MLX-6bit	HELLASWAG	69.0%	207	300	305.7
GLM-4.7-Flash-MLX-6bit	ARC_CHALLENGE	86.0%	258	300	200.5
GLM-4.7-Flash-MLX-6bit	GSM8K	89.3%	268	300	813.9
GLM-4.7-Flash-oQ3	MMLU	63.7%	191	300	554.4
GLM-4.7-Flash-oQ3	JMMLU	56.3%	169	300	433.9
GLM-4.7-Flash-oQ3	HELLASWAG	62.0%	186	300	355.8
GLM-4.7-Flash-oQ3	ARC_CHALLENGE	80.3%	241	300	196.4
GLM-4.7-Flash-oQ3	GSM8K	88.0%	264	300	857.8
GLM-4.7-Flash-oQ3.5	MMLU	63.7%	191	300	564.6
GLM-4.7-Flash-oQ3.5	JMMLU	56.7%	170	300	439.6
GLM-4.7-Flash-oQ3.5	HELLASWAG	59.3%	178	300	335.4
GLM-4.7-Flash-oQ3.5	ARC_CHALLENGE	78.7%	236	300	192.8
GLM-4.7-Flash-oQ3.5	GSM8K	84.0%	252	300	859.4
GLM-4.7-Flash-oQ4	MMLU	71.0%	213	300	569
GLM-4.7-Flash-oQ4	JMMLU	60.0%	180	300	297.9
GLM-4.7-Flash-oQ4	HELLASWAG	62.0%	186	300	346.3
GLM-4.7-Flash-oQ4	ARC_CHALLENGE	84.3%	253	300	190.9
GLM-4.7-Flash-oQ4	GSM8K	92.0%	276	300	820.9
GLM-4.7-Flash-REAP-23B-A3B-6bit	MMLU	62.3%	187	300	505.9
GLM-4.7-Flash-REAP-23B-A3B-6bit	JMMLU	46.0%	138	300	239.7
GLM-4.7-Flash-REAP-23B-A3B-oQ3	MMLU	53.3%	160	300	602.7
GLM-4.7-Flash-REAP-23B-A3B-oQ3	JMMLU	38.3%	115	300	255.7
GLM-4.7-Flash-REAP-23B-A3B-oQ3	HELLASWAG	47.7%	143	300	346.8
GLM-4.7-Flash-REAP-23B-A3B-oQ3	ARC_CHALLENGE	73.3%	220	300	204.8
GLM-4.7-Flash-REAP-23B-A3B-oQ3	GSM8K	73.3%	220	300	1029.3
GLM-4.7-Flash-REAP-23B-A3B-oQ3.5	MMLU	57.7%	173	300	555.1
GLM-4.7-Flash-REAP-23B-A3B-oQ3.5	JMMLU	49.3%	148	300	252.4
GLM-4.7-Flash-REAP-23B-A3B-oQ4	MMLU	63.3%	190	300	550.7
GLM-4.7-Flash-REAP-23B-A3B-oQ4	JMMLU	39.7%	119	300	250.9
GLM-4.7-Flash-REAP-23B-A3B-oQ4	MMLU	59.3%	178	300	547.7
GLM-4.7-Flash-REAP-23B-A3B-oQ4	JMMLU	43.0%	129	300	232.6
GLM-4.7-Flash-REAP-23B-A3B-oQ4	HELLASWAG	53.3%	160	300	300.5
GLM-4.7-Flash-REAP-23B-A3B-oQ4	ARC_CHALLENGE	78.7%	236	300	179.7
GLM-4.7-Flash-REAP-23B-A3B-oQ4	GSM8K	87.7%	263	300	748.4
GLM-4.7-Flash-REAP-23B-A3B-oQ5	MMLU	61.0%	183	300	617.8
GLM-4.7-Flash-REAP-23B-A3B-oQ5	JMMLU	45.3%	136	300	273
GLM-4.7-Flash-REAP-23B-A3B-oQ5	HELLASWAG	59.0%	177	300	353.6
GLM-4.7-Flash-REAP-23B-A3B-oQ5	ARC_CHALLENGE	81.0%	243	300	201.2
GLM-4.7-Flash-REAP-23B-A3B-oQ5	GSM8K	90.0%	270	300	1001.1
GLM-4.7-Flash-REAP-23B-A3B-oQ5	MMLU	61.0%	183	300	617.8
GLM-4.7-Flash-REAP-23B-A3B-oQ5	JMMLU	45.3%	136	300	273
GLM-4.7-Flash-REAP-23B-A3B-oQ5	HELLASWAG	59.0%	177	300	353.6
GLM-4.7-Flash-REAP-23B-A3B-oQ5	ARC_CHALLENGE	81.0%	243	300	201.2
GLM-4.7-Flash-REAP-23B-A3B-oQ5	GSM8K	90.0%	270	300	1001.1

Downloads last month: 100

Safetensors

Model size

4B params

Tensor type

BF16

U32

MLX

Hardware compatibility

3-bit

Model tree for RepublicOfKorokke/GLM-4.7-Flash-oQ3

Base model

zai-org/GLM-4.7-Flash

Quantized

(83)

this model