example code doesn't work at all

by cloudyu - opened Jun 28, 2024

Discussion

cloudyu

Jun 28, 2024

•

edited Jun 28, 2024

output is: pad only
Prompt: Write me a poem about Machine Learning.

cloudyu

Jun 28, 2024

mlx 0.15.2
mlx-lm 0.15.0

prince-canuma

MLX Community org Jun 28, 2024

The example code should work fine:

from mlx_lm import load, generate

model, tokenizer = load("mlx-community/gemma-2-27b-it-8bit")
response = generate(model, tokenizer, prompt="hello", verbose=True)

prince-canuma changed discussion status to closed Jun 28, 2024

ndurner

Jun 29, 2024

Reproducible here:

% mlx_lm.generate --model "mlx-community/gemma-2-27b-it-8bit" --prompt "Hello"
Fetching 11 files: 100%|█████████████████████| 11/11 [00:00<00:00, 31152.83it/s]
==========
Prompt: <bos><start_of_turn>user
Hello<end_of_turn>
<start_of_turn>model

<pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad>
==========
Prompt: 0.538 tokens-per-sec
Generation: 1.840 tokens-per-sec

% python3 prince.py 
Fetching 11 files: 100%|█████████████████████| 11/11 [00:00<00:00, 34820.64it/s]
==========
Prompt: hello
<pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad>
==========
Prompt: 0.124 tokens-per-sec
Generation: 2.043 tokens-per-sec

cloudyu

Jun 29, 2024

yep, very bad exprience.
not work, but someone still tell you works.

The example code should work fine:

from mlx_lm import load, generate

model, tokenizer = load("mlx-community/gemma-2-27b-it-8bit")
response = generate(model, tokenizer, prompt="hello", verbose=True)

do you really test the code?

very bad exprience.
it not work, but someone still tell you it works.

ndurner

Jun 29, 2024

I have previously noticed differences with mlx-vlm (and PaliGemma) vs. the official demo on HF as well - but didn't have time to pursue this further. Perhaps there is an underlying MLX issue? I am using macOS 14.3 on M3 Max.

By contrast, the 9B-FP16 variant does work:

% mlx_lm.generate --model "mlx-community/gemma-2-9b-it-fp16" --prompt "Hello"
Fetching 9 files: 100%|████████████████████████| 9/9 [00:00<00:00, 17614.90it/s]

Prompt: user
Hello
model

Hello! 👋

How can I help you today? 😊

==========
Prompt: 6.337 tokens-per-sec
Generation: 13.758 tokens-per-sec

prince-canuma

MLX Community org Jun 30, 2024

I'm sorry @cloudyu @ndurner ,

It was an oversight on my part,

There is a tiny bug with the 27B version, and should be fixed soon:
https://github.com/ml-explore/mlx-examples/pull/857

prince-canuma changed discussion status to open Jun 30, 2024

prince-canuma

MLX Community org Jul 4, 2024

Fixed ✅

pip install -U mlx-lm

prince-canuma changed discussion status to closed Jul 4, 2024

kmgk

Dec 17, 2024

This is again an issue. Output is again after version 0.19.1. It works up to 0.19.0 only.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment

example code doesn't work at all

% mlx_lm.generate --model "mlx-community/gemma-2-9b-it-fp16" --prompt "Hello"Fetching 9 files: 100%|████████████████████████| 9/9 [00:00<00:00, 17614.90it/s]

% mlx_lm.generate --model "mlx-community/gemma-2-9b-it-fp16" --prompt "Hello"
Fetching 9 files: 100%|████████████████████████| 9/9 [00:00<00:00, 17614.90it/s]