multilingual-reward-bench

community

AI & ML interests

None defined yet.

Recent Activity

amphora authored a paper 3 days ago

Multi-Task Inference: Can Large Language Models Follow Multiple Instructions at Once?

amphora authored a paper 3 days ago

The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models

amphora authored a paper 3 days ago

BenchHub: A Unified Benchmark Suite for Holistic and Customizable LLM Evaluation

View all activity

models 0

None public yet

datasets 9

multilingual-reward-bench/m-arena-sampled

Viewer • Updated Mar 25, 2025 • 128 • 14

multilingual-reward-bench/m-arena

Viewer • Updated Mar 25, 2025 • 2.16k • 12

multilingual-reward-bench/MRB-Preview-1013

Viewer • Updated Oct 13, 2024 • 5.09k • 10

multilingual-reward-bench/code-en

Viewer • Updated Oct 12, 2024 • 80 • 18

multilingual-reward-bench/code-python

Viewer • Updated Oct 12, 2024 • 1.84k • 25

multilingual-reward-bench/safetyx1_prefx05_sky_x05_small

Viewer • Updated Oct 10, 2024 • 13.4k • 8

multilingual-reward-bench/safetyx2_prefx1_sky_x1_small

Viewer • Updated Oct 10, 2024 • 26.8k • 9

multilingual-reward-bench/safetyx2_prefx1_sky_x1

Viewer • Updated Oct 10, 2024 • 40.3k • 35

multilingual-reward-bench/open-assistant-sampled-new

Viewer • Updated Oct 7, 2024 • 444 • 127