Center for Human-Compatible AI

university

https://humancompatible.ai/

chai_berkeley

HumanCompatibleAI

Activity Feed

AI & ML interests

None defined yet.

HumanCompatibleAI's activity

TianyiQ

authored 5 papers 7 months ago

scottemmons

authored 3 papers 11 months ago

Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark

Paper • 2304.03279 • Published Apr 6, 2023 • 1

A StrongREJECT for Empty Jailbreaks

Paper • 2402.10260 • Published Feb 15, 2024

When Your AI Deceives You: Challenges with Partial Observability of Human Evaluators in Reward Learning

Paper • 2402.17747 • Published Feb 27, 2024

qxcv

authored a paper 11 months ago

A StrongREJECT for Empty Jailbreaks

Paper • 2402.10260 • Published Feb 15, 2024

AdamGleave

authored a paper about 1 year ago

Exploiting Novel GPT-4 APIs

Paper • 2312.14302 • Published Dec 21, 2023 • 12

qxcv

authored a paper about 1 year ago

Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game

Paper • 2311.01011 • Published Nov 2, 2023

ernestum

updated 9 datasets over 1 year ago

HumanCompatibleAI/random-seals-Walker2d-v1

Viewer • Updated Oct 17, 2023 • 100 • 32

HumanCompatibleAI/random-seals-Swimmer-v1

Viewer • Updated Oct 17, 2023 • 100 • 28

HumanCompatibleAI/random-seals-Hopper-v1

Viewer • Updated Oct 17, 2023 • 100 • 31

HumanCompatibleAI/random-seals-HalfCheetah-v1

Viewer • Updated Oct 17, 2023 • 100 • 35

HumanCompatibleAI/random-seals-Ant-v1

Viewer • Updated Oct 17, 2023 • 100 • 35

HumanCompatibleAI/ppo-Pendulum-v1

Viewer • Updated Oct 4, 2023 • 200 • 54

HumanCompatibleAI/ppo-seals-Humanoid-v1

Viewer • Updated Sep 27, 2023 • 104 • 30

HumanCompatibleAI/ppo-seals-Walker2d-v1

Viewer • Updated Sep 27, 2023 • 104 • 30

HumanCompatibleAI/ppo-seals-Hopper-v1

Viewer • Updated Sep 27, 2023 • 104 • 31

AI & ML interests

Team members 9

HumanCompatibleAI's activity