batched predictions with padding through the model don't seem to work correctly

by karthikramen - opened Jun 25

Jun 25

    input = tokenizer.apply_chat_template(
        [
          [{"role": "user", "content": prompt}, {"role": "assistant", "content": response }], 
          [{"role": "user", "content": prompt2 }, {"role": "assistant", "content": response2 }], 
        return_tensors="pt",
        truncation=True,
        padding=True,
        tokenize=True
    ).to(model.device)

    with torch.no_grad():
      model(**input)

this doesn't work as intended -- I checked all the logic in the tokenizer and the config and they seem to be correct -- haven't dug into the custom modeling code yet

you can tell this is different because if you only pass in one set of inputs instead of multiple, you get different results

karthikramen

Jun 25

This comment has been hidden

karthikramen

Jun 25

my guess is that before passing the hidden states into the final NNs self.rewards and self.gating, you need to apply the attention_mask to filter out the tokens that are generated by padding

here: https://huggingface.co/RLHFlow/ArmoRM-Llama3-8B-v0.1/blob/main/modeling_custom.py#L150 and a bit below

karthikramen

Jun 26

having dug in some more, the code to define sequence lengths and gating tokens is pretty accurate, but the llama model (transformer_outputs = model(...)[0]) just generates fundamentally different values depending on whether you have a padded input or a non-padded input, even when the attention_mask is defined correctly

this takes me back to the beginning where either there's something wrong with the tokenizer and / or the llama-3 implementation :sad:

karthikramen

Jun 26

ok, I figured out my issue -- it was because I was quantizing the model to int8 in order to get batching to work faster, but this messes up some of the model's internals. for now, I've gone with quantizing it to fp4 instead which gives an added speed boost and ensures that all the metrics match what I expect.

karthikramen changed discussion status to closed Jun 26

Haoxiang-Wang

RLHFlow org Jun 27

Thanks for the information!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment