File size: 4,180 Bytes
18f8c91 6fe52e9 18f8c91 64224da 18f8c91 64224da 18f8c91 8a672b3 18f8c91 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 |
---
language: "english"
license: "mit"
datasets:
- race
- ai2_arc
- openbookqa
metrics:
- accuracy
---
# Roberta Large Fine Tuned on RACE
## Model description
This model follows the implementation by Allen AI team about [Aristo Roberta V7 Model](https://leaderboard.allenai.org/arc/submission/blcotvl7rrltlue6bsv0) given in [ARC Challenge](https://leaderboard.allenai.org/arc/submissions/public)
#### How to use
```python
import datasets
from transformers import RobertaTokenizer
from transformers import RobertaForMultipleChoice
tokenizer = RobertaTokenizer.from_pretrained(
"LIAMF-USP/aristo-roberta")
model = RobertaForMultipleChoice.from_pretrained(
"LIAMF-USP/aristo-roberta")
dataset = datasets.load_dataset(
"arc",,
split=["train", "validation", "test"],
)
training_examples = dataset[0]
evaluation_examples = dataset[1]
test_examples = dataset[2]
example=training_examples[0]
example_id = example["example_id"]
question = example["question"]
label_example = example["answer"]
options = example["options"]
if label_example in ["A", "B", "C", "D", "E"]:
label_map = {label: i for i, label in enumerate(
["A", "B", "C", "D", "E"])}
elif label_example in ["1", "2", "3", "4", "5"]:
label_map = {label: i for i, label in enumerate(
["1", "2", "3", "4", "5"])}
else:
print(f"{label_example} not found")
while len(options) < 5:
empty_option = {}
empty_option['option_context'] = ''
empty_option['option_text'] = ''
options.append(empty_option)
choices_inputs = []
for ending_idx, option in enumerate(options):
ending = option["option_text"]
context = option["option_context"]
if question.find("_") != -1:
# fill in the banks questions
question_option = question.replace("_", ending)
else:
question_option = question + " " + ending
inputs = tokenizer(
context,
question_option,
add_special_tokens=True,
max_length=MAX_SEQ_LENGTH,
padding="max_length",
truncation=True,
return_overflowing_tokens=False,
)
if "num_truncated_tokens" in inputs and inputs["num_truncated_tokens"] > 0:
logging.warning(f"Question: {example_id} with option {ending_idx} was truncated")
choices_inputs.append(inputs)
label = label_map[label_example]
input_ids = [x["input_ids"] for x in choices_inputs]
attention_mask = (
[x["attention_mask"] for x in choices_inputs]
# as the senteces follow the same structure, just one of them is
# necessary to check
if "attention_mask" in choices_inputs[0]
else None
)
example_encoded = {
"example_id": example_id,
"input_ids": input_ids,
"attention_mask": attention_mask,
"token_type_ids": token_type_ids,
"label": label
}
output = model(**example_encoded)
```
## Training data
the Training data was the same as proposed [here](https://leaderboard.allenai.org/arc/submission/blcotvl7rrltlue6bsv0)
The only diferrence was the hypeparameters of RACE fine tuned model, which were reported [here](https://huggingface.co/LIAMF-USP/roberta-large-finetuned-race#eval-results)
## Training procedure
It was necessary to preprocess the data with a method that is exemplified for a single instance in the _How to use_ section. The used hyperparameters were the following:
| Hyperparameter | Value |
|:----:|:----:|
| adam_beta1 | 0.9 |
| adam_beta2 | 0.98 |
| adam_epsilon | 1.000e-8 |
| eval_batch_size | 16 |
| train_batch_size | 4 |
| fp16 | True |
| gradient_accumulation_steps | 4 |
| learning_rate | 0.00001 |
| warmup_steps | 0.06 |
| max_length | 256 |
| epochs | 4 |
The other parameters were the default ones from [Trainer](https://huggingface.co/transformers/main_classes/trainer.html) and [Trainer Arguments](https://huggingface.co/transformers/main_classes/trainer.html#trainingarguments)
## Eval results:
| Dataset Acc | Challenge Test |
|:----:|:----:|
| | 65.358 |
**The model was trained with a TITAN RTX**
|