The generative output is strange

#11
by tangpeng - opened

I use following code to run the model, but get strange output.
Is anyone else seeing this?

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name_or_path = "mistralai/Mixtral-8x7B-v0.1-GPTQ-8bit" ##  gptq-8bit-128g-actorder_True branch
# To use a different branch, change revision
# For example: revision="gptq-4bit-128g-actorder_True"
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
                                             device_map="auto",
                                             offload_buffers=True
                                             )

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)

prompt = "Write a story about llamas"
system_message = "You are a story writing assistant"
prompt_template=f'''{prompt}
'''

print("\n\n*** Generate:")

inputs = tokenizer(prompt_template, return_tensors='pt')
input_ids = {k: v.to('cuda') for k, v in inputs.items()}
output = model.generate(**input_ids, max_length=50)
print(tokenizer.decode(output[0]) )

output:

/home/tp/miniconda3/envs/moe-infinity/lib/python3.9/site-packages/transformers/modeling_utils.py:4225: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead
  warnings.warn(
/home/tp/miniconda3/envs/moe-infinity/lib/python3.9/site-packages/accelerate/utils/modeling.py:1363: UserWarning: Current model requires 32343407360 bytes of buffer for offloaded layers, which seems does not fit any GPU's remaining memory. If you are experiencing a OOM later, please consider using offload_buffers=True.
  warnings.warn(
WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu.


*** Generate:
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
<s> Write a story about llamas
<unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk>

environment:
Nvidia RTX4090, 24GB memory

Package                       Version     Editable project location
----------------------------- ----------- ---------------------------------------------
accelerate                    0.29.3
aiohttp                       3.9.5
aiosignal                     1.3.1
alabaster                     0.7.16
async-timeout                 4.0.3
attrs                         23.2.0
auto_gptq                     0.7.1
Babel                         2.14.0
certifi                       2024.2.2
chardet                       5.2.0
charset-normalizer            3.3.2
coloredlogs                   15.0.1
datasets                      2.19.0
dill                          0.3.8
docutils                      0.21.1
einops                        0.7.0
filelock                      3.13.4
flash-attn                    2.5.7
frozenlist                    1.4.1
fsspec                        2024.3.1
gekko                         1.1.1
hjson                         3.1.0
huggingface-hub               0.22.2
humanfriendly                 10.0
idna                          3.7
imagesize                     1.4.1
importlib_metadata            7.1.0
Jinja2                        3.1.3
MarkupSafe                    2.1.5
moe_infinity                  0.0.1       /home/tp/edge_moe/baselines/MoE-Infinity-main
mpmath                        1.3.0
multidict                     6.0.5
multiprocess                  0.70.16
networkx                      3.2.1
ninja                         1.11.1.1
numpy                         1.26.4
nvidia-cublas-cu12            12.1.3.1
nvidia-cuda-cupti-cu12        12.1.105
nvidia-cuda-nvrtc-cu12        12.1.105
nvidia-cuda-runtime-cu12      12.1.105
nvidia-cudnn-cu12             8.9.2.26
nvidia-cufft-cu12             11.0.2.54
nvidia-curand-cu12            10.3.2.106
nvidia-cusolver-cu12          11.4.5.107
nvidia-cusparse-cu12          12.1.0.106
nvidia-nccl-cu12              2.19.3
nvidia-nvjitlink-cu12         12.4.127
nvidia-nvtx-cu12              12.1.105
optimum                       1.19.0
packaging                     24.0
pandas                        2.2.2
peft                          0.10.0
pip                           23.3.1
protobuf                      5.26.1
psutil                        5.9.8
py-cpuinfo                    9.0.0
pyarrow                       12.0.0
pyarrow-hotfix                0.6
pydantic                      1.10.12
Pygments                      2.17.2
python-dateutil               2.9.0.post0
pytz                          2024.1
PyYAML                        6.0.1
regex                         2024.4.16
requests                      2.31.0
rouge                         1.0.1
safetensors                   0.4.3
scipy                         1.13.0
sentencepiece                 0.2.0
setuptools                    68.2.2
six                           1.16.0
snowballstemmer               2.2.0
Sphinx                        7.3.7
sphinxcontrib-applehelp       1.0.8
sphinxcontrib-devhelp         1.0.6
sphinxcontrib-htmlhelp        2.0.5
sphinxcontrib-jsmath          1.0.1
sphinxcontrib-qthelp          1.0.7
sphinxcontrib-serializinghtml 1.1.10
sympy                         1.12
tokenizers                    0.15.2
tomli                         2.0.1
torch                         2.2.2
tqdm                          4.66.2
transformers                  4.39.3
triton                        2.2.0
typing_extensions             4.11.0
tzdata                        2024.1
urllib3                       2.2.1
wheel                         0.41.2
xxhash                        3.4.1
yarl                          1.9.4
zipp                          3.18.1

Thanks!

Sign up or log in to comment