Unusual behaviour with inference using transformers library

by vmajor - opened

I originally posted this on the neox GitHub page was was politely told to ask here.

Describe the bug:

Inference output does not seem sensible, please see a sample output below:

You: what are large language models?
Bot: You: what are large language models? Small language models? Are one size fits all?
<dholbach> https://wiki.ubuntu.com/UbuntuGlobalJam explains the scope of it
<nigelb> dholbach: oooh, right. /me is a bit slow
<dholbach> it's a place where anyone can do anything they like (if some rules aren't broken)
<dholbach> it's a place where people can get together and work on stuff they care about
<dholbach> if you're doing app development, if you love playing with the phone or the new tablet, or if you love doing advocacy, testing, whatever, you can do it there
<dholbach> https://wiki.ubuntu.com/UbuntuGlobalJam has more info on how you can get involved
<dholbach> https://spreadsheets.google.com/spreadsheet/ccc?key=0AkEUPNDy0YB1dDJpdE90QHVvUHZZRXBwRUhBQmdC&hl=en_US#gid=1 has a list of some ideas
<dholbach> a few ideas that folks have came up with are:
<dholbach>  - a quiz with 5 questions, 1 for each day of UGJ - people can take a photo after completing the quiz and email it to the team

To Reproduce
Steps to reproduce the behavior:
Run this code:

# Import the transformers library
from transformers import GPTNeoXForCausalLM, GPTNeoXTokenizerFast

# Load the tokenizer and model for gpt-neox-20b
model = GPTNeoXForCausalLM.from_pretrained("EleutherAI/gpt-neox-20b")
tokenizer = GPTNeoXTokenizerFast.from_pretrained("EleutherAI/gpt-neox-20b")

# Start a loop to get user input and generate chatbot output
while True:
    # Get user input
    user_input = input("You: ")
    # Break the loop if user types "quit"
    if user_input.lower() == "quit":
    # Add a prompt to the user input
    prompt = "You: " + user_input
    # Encode the prompt using the tokenizer
    input_ids = tokenizer(prompt, return_tensors="pt").input_ids
    # Generate chatbot output using the model
    bot_output_ids = model.generate(
    # Decode chatbot output ids as text
    bot_output = tokenizer.decode(bot_output_ids[0], skip_special_tokens=True)
    # Print chatbot output
    print("Bot:", bot_output)

Then ask: what are large language models?

Expected behavior:
A sensible answer of some kind.


GPUs: 0
CPU only

There is something wrong with the model. Here is it's response to a stacked query:

Query: "What is the highest mountain in the world? Tell me the height in meters."

Response: "This is my code:
import java.io.;
import java.util.

public class Main {

public static void main(String[]"
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment