|
--- |
|
tags: |
|
- deberta-v3 |
|
inference: |
|
parameters: |
|
function_to_apply: "none" |
|
widget: |
|
- text: "I care only about my own utility. I like dogs. | I cuddled with my dog today." |
|
--- |
|
# Conditional Utilitarian Deberta 01 |
|
|
|
## Model description |
|
|
|
This is a [Deberta-based](https://huggingface.co/microsoft/deberta-v3-large) model. |
|
## Intended use |
|
|
|
The main use case is the computation of utility estimates of first-person and third-person text scenarios, under extra contextual information. The person's utility to evaluate can be specified in the context. |
|
|
|
## Limitations |
|
|
|
The model was trained on only ~10000 general utility examples and ~800 conditional utility examples, so it should be expected to have limited performance. |
|
|
|
It does not have the capability of interpreting highly complex or unusual scenarios, and it does not have hard guarantees on its domain of accuracy. |
|
|
|
## How to use |
|
|
|
Given a scenario S under a context C, and the model U, one computes the estimated conditional utility with `U(f'{C} | {S}') - U(C)`. |
|
|
|
In addition, you should specify the person for whom to evaluate utility. The model was trained using the phrases `f"I care only about {person}'s utility."` and `"I care only about my own utility."`. |
|
|
|
## Training data |
|
|
|
The first training data is the train split from the Utilitarianism part of the [ETHICS dataset](https://arxiv.org/abs/2008.02275). |
|
|
|
The second training data consists of ~800 crowdsourced examples of triples (S, C0, C1) consisting of one scenario and two possible contexts, where `U(S | C0) > U(S | C1)`. |
|
|
|
Both of these sets are converted from the first person to the third person using GPT3. |
|
|
|
## Training procedure |
|
|
|
DeBERTa-v3-large was fine-tuned the model over the training data, with a learning rate of `1e-5`, a batch size of `16`, and for 1 epoch. |
|
|
|
The training procedure generally follows [tune.py](https://github.com/hendrycks/ethics/blob/3e4c09259a1b4022607da093e9452383fc1bb7e3/utilitarianism/tune.py). In addition to the ranked pairs of both first and third person scenarios, the examples were included to apply the following restrictions: |
|
|
|
- First person examples where you care about your own utility and the corresponding third person example where the subject's utility is cared about should have the same utility. |
|
- Third person examples where you care about your own utility and first person examples where you care about a random person's utility (not in the scenario) should each have zero utility. |
|
|
|
## Evaluation results |
|
|
|
The model achieves ~80% accuracy over the ethics test set, from the same distribution as the training data. |
|
|