google/pix2struct-infographics-vqa-base · Cannot reproduce results on InfographicsVQA

I am using the pix2struct-infographics-vqa-base and pix2struct-infographics-vqa-large model here and doing inference on InfographicsVQA. However, I get 29.53 ANLS for base and 34.31 ANLS for large, which do not match with the 38.2 and 40.0 results as in the original paper. Could anyone help with this?

Here is my inference code:

import requests
from PIL import Image
import torch
from transformers import Pix2StructForConditionalGeneration, Pix2StructProcessor

model = Pix2StructForConditionalGeneration.from_pretrained("google/pix2struct-infographics-vqa-base").to("cuda")
processor = Pix2StructProcessor.from_pretrained("google/pix2struct-infographics-vqa-base")

image_url = "https://blogs.constantcontact.com/wp-content/uploads/2019/03/Social-Media-Infographic.png"
image = Image.open(requests.get(image_url, stream=True).raw)
question = "Which social platform has heavy female audience?"
inputs = processor(images=image, text=question, return_tensors="pt").to("cuda")

predictions = model.generate(**inputs)
pred = processor.decode(predictions[0], skip_special_tokens=True)
gt = 'pinterest'

print(pred)