Introducing ConTextual: How well can your Multimodal model jointly reason over text and image in text-rich scenes? Mar 5 • 4
mint-multix/llava_med_instruct_60k_qa_woim_valid_tokenized Viewer • Updated about 7 hours ago • 38.4k
hbXNov/llama-3.1-8b-instruct-math_train-correct-verifications-balanced Viewer • Updated 3 days ago • 36.3k • 1