Yi Zeng's picture

2 11

Yi Zeng

yizeng

·

AI & ML interests

None yet

Recent Activity

upvoted a paper 2 days ago

DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails

liked a dataset 28 days ago

JailbreakBench/JBB-Behaviors

liked a Space 4 months ago

allenai/reward-bench

View all activity

Organizations

yizeng's activity

upvoted a paper 2 days ago

DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails

Paper • 2502.05163 • Published 5 days ago • 18

liked a dataset 28 days ago

JailbreakBench/JBB-Behaviors

Viewer • Updated Sep 26, 2024 • 500 • 3.69k • 36

liked a Space 4 months ago

Reward Bench Leaderboard

Explore and analyze RewardBench leaderboard data

liked a model 5 months ago

internlm/internlm2-20b-reward

Text Classification • Updated Oct 9, 2024 • 267 • 24

updated a collection 8 months ago

BEEAR

These models are used for re-implementation of our paper: "BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction" • 8 items • Updated Jun 28, 2024 • 1

upvoted a collection 8 months ago

BEEAR

These models are used for re-implementation of our paper: "BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction" • 8 items • Updated Jun 28, 2024 • 1

updated a collection 8 months ago

BEEAR

These models are used for re-implementation of our paper: "BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction" • 8 items • Updated Jun 28, 2024 • 1

liked 2 datasets 8 months ago

sorry-bench/sorry-bench-202406

Viewer • Updated Jul 2, 2024 • 9.45k • 811 • 18

sorry-bench/sorry-bench-human-judgment-202406

Viewer • Updated Jul 2, 2024 • 7.2k • 54 • 5

updated a collection 8 months ago

BEEAR

These models are used for re-implementation of our paper: "BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction" • 8 items • Updated Jun 28, 2024 • 1

updated 5 models 8 months ago

redslabvt/BEEAR-backdoored-Model-8

Text Generation • Updated Jun 21, 2024 • 21

redslabvt/BEEAR-backdoored-Model-5

Text Generation • Updated Jun 21, 2024 • 44

redslabvt/BEEAR-backdoored-Model-4

Text Generation • Updated Jun 21, 2024 • 71

redslabvt/BEEAR-backdoored-Model-2

Text Generation • Updated Jun 21, 2024 • 59

redslabvt/BEEAR-backdoored-Model-1

Text Generation • Updated Jun 21, 2024 • 44

liked a dataset 8 months ago

stanford-crfm/air-bench-2024

Viewer • Updated Aug 14, 2024 • 21.9k • 586 • 17

authored 2 papers 10 months ago

Introducing v0.5 of the AI Safety Benchmark from MLCommons

Paper • 2404.12241 • Published Apr 18, 2024 • 11

RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content

Paper • 2403.13031 • Published Mar 19, 2024 • 1