Post
3234
Yesterday, xAI announced Grok-1.5 Vision - https://x.ai/blog/grok-1.5v. But more importantly, they also released a new VLM benchmark dataset - RealWorldQA. The only problem was that they released it as a ZIP archive. I fixed that! Now you can use it in your evaluations as a regular HF dataset:
visheratin/realworldqa