Performance issues when compared to other codellama_34b finetunes
Although this model claims the highest python scores on humaneval, ive had better generation results (specifically in python) from the phinds codellama_python_34b finetune than this one. My test specifically involved writing code over 1k tokens long. One example was to code an entire game of snake in python, to which the wizardcoder-34b model not only failed to do, but also couldnt fix errors with. While the phind model achieved correct results first time, and from others testing the airoboros-34b model they had even better results.
All im asking is a simple explanation for how your model could have such high benchmark scores but still lack in code generation quality compared specifically to its competition.
And does this issue only apply to longer pieces of code generation, which the previous wizardcoder-15b model also struggles with.
@WizardLM
We disagree with you.
- You don't provide any examples, like prompts or captures. Without these, we cannot help you fix your problems.
- We have tested WizardCoder-34B with this prompt:
write a snake game with python
. lt can write an entire snake game, which also can be played. - Someone on twitter also gets a snake game from our model.
- This is the generated code.
import pygame
import random
# initialize pygame
pygame.init()
# set the window size
WINDOW_SIZE = [800, 600]
# create the window
screen = pygame.display.set_mode(WINDOW_SIZE)
# set the title of the window
pygame.display.set_caption("Snake Game")
# set the game clock
clock = pygame.time.Clock()
# set the game variables
snake_pos = [100, 50]
snake_body = [[100, 50], [100-10, 50], [100-(2*10), 50]]
food_pos = [random.randint(0, WINDOW_SIZE[0]//10)*10, random.randint(0, WINDOW_SIZE[1]//10)*10]
food_spawned = True
direction = "RIGHT"
change_to = direction
score = 0
# set the game loop
running = True
while running:
# handle events
for event in pygame.event.get():
if event.type == pygame.QUIT:
running = False
# handle input
if event.type == pygame.KEYDOWN:
if event.key == pygame.K_LEFT:
change_to = "LEFT"
if event.key == pygame.K_RIGHT:
change_to = "RIGHT"
if event.key == pygame.K_UP:
change_to = "UP"
if event.key == pygame.K_DOWN:
change_to = "DOWN"
# handle direction changes
if change_to == "LEFT" and direction != "RIGHT":
direction = "LEFT"
if change_to == "RIGHT" and direction != "LEFT":
direction = "RIGHT"
if change_to == "UP" and direction != "DOWN":
direction = "UP"
if change_to == "DOWN" and direction != "UP":
direction = "DOWN"
# move the snake
if direction == "LEFT":
snake_pos[0] -= 10
if direction == "RIGHT":
snake_pos[0] += 10
if direction == "UP":
snake_pos[1] -= 10
if direction == "DOWN":
snake_pos[1] += 10
# add a new segment to the snake
snake_body.insert(0, list(snake_pos))
# check if the snake has eaten the food
if snake_pos[0] == food_pos[0] and snake_pos[1] == food_pos[1]:
score += 1
food_spawned = False
else:
snake_body.pop()
# spawn a new food if it has been eaten
if not food_spawned:
food_pos = [random.randint(0, WINDOW_SIZE[0]//10)*10, random.randint(0, WINDOW_SIZE[1]//10)*10]
food_spawned = True
# draw the game objects
screen.fill((0, 0, 0))
for pos in snake_body:
pygame.draw.rect(screen, (255, 255, 255), pygame.Rect(pos[0], pos[1], 10, 10))
pygame.draw.rect(screen, (255, 0, 0), pygame.Rect(food_pos[0], food_pos[1], 10, 10))
# draw the score
font = pygame.font.Font(None, 36)
text = font.render(f"Score: {score}", True, (255, 255, 255))
screen.blit(text, (WINDOW_SIZE[0]//2 - text.get_width()//2, 10))
# update the screen
pygame.display.update()
# set the game speed
clock.tick(10)
# quit the game
pygame.quit()
You can simply use this prompt write a snake game with python
on our demo.
Or you can use our demo code locally. wizardcoder_demo.py
Set the temperature to 0, so the generation will be greedy decoding. You can get the same code.
Oh gotcha, i use text generation webui, so i have access to alot more setting, like top_p, top_k, temperature, typical_p, ect.
Without greedy decoding, you will get different codes for each try. So luck matters.
Ok thanks for all the information