|
--- |
|
language: "english" |
|
tags: |
|
license: "mit" |
|
datasets: |
|
- race |
|
- ai2_arc |
|
- openbookqa |
|
metrics: |
|
- accuracy |
|
--- |
|
|
|
# Roberta Large Fine Tuned on RACE |
|
|
|
## Model description |
|
|
|
This model follows the implementation by Allen AI team about [Aristo Roberta V7 Model](https://leaderboard.allenai.org/arc/submission/blcotvl7rrltlue6bsv0) given in [ARC Challenge](https://leaderboard.allenai.org/arc/submissions/public) |
|
|
|
#### How to use |
|
|
|
```python |
|
|
|
import datasets |
|
from transformers import RobertaTokenizer |
|
from transformers import RobertaForMultipleChoice |
|
|
|
tokenizer = RobertaTokenizer.from_pretrained( |
|
"LIAMF-USP/aristo-roberta") |
|
model = RobertaForMultipleChoice.from_pretrained( |
|
"LIAMF-USP/aristo-roberta") |
|
dataset = datasets.load_dataset( |
|
"arc",, |
|
split=["train", "validation", "test"], |
|
) |
|
training_examples = dataset[0] |
|
evaluation_examples = dataset[1] |
|
test_examples = dataset[2] |
|
|
|
example=training_examples[0] |
|
example_id = example["example_id"] |
|
question = example["question"] |
|
label_example = example["answer"] |
|
options = example["options"] |
|
if label_example in ["A", "B", "C", "D", "E"]: |
|
label_map = {label: i for i, label in enumerate( |
|
["A", "B", "C", "D", "E"])} |
|
elif label_example in ["1", "2", "3", "4", "5"]: |
|
label_map = {label: i for i, label in enumerate( |
|
["1", "2", "3", "4", "5"])} |
|
else: |
|
print(f"{label_example} not found") |
|
while len(options) < 5: |
|
empty_option = {} |
|
empty_option['option_context'] = '' |
|
empty_option['option_text'] = '' |
|
options.append(empty_option) |
|
choices_inputs = [] |
|
for ending_idx, option in enumerate(options): |
|
ending = option["option_text"] |
|
context = option["option_context"] |
|
if question.find("_") != -1: |
|
# fill in the banks questions |
|
question_option = question.replace("_", ending) |
|
else: |
|
question_option = question + " " + ending |
|
|
|
inputs = tokenizer( |
|
context, |
|
question_option, |
|
add_special_tokens=True, |
|
max_length=MAX_SEQ_LENGTH, |
|
padding="max_length", |
|
truncation=True, |
|
return_overflowing_tokens=False, |
|
) |
|
|
|
if "num_truncated_tokens" in inputs and inputs["num_truncated_tokens"] > 0: |
|
logging.warning(f"Question: {example_id} with option {ending_idx} was truncated") |
|
choices_inputs.append(inputs) |
|
label = label_map[label_example] |
|
input_ids = [x["input_ids"] for x in choices_inputs] |
|
attention_mask = ( |
|
[x["attention_mask"] for x in choices_inputs] |
|
# as the senteces follow the same structure, just one of them is |
|
# necessary to check |
|
if "attention_mask" in choices_inputs[0] |
|
else None |
|
) |
|
example_encoded = { |
|
"example_id": example_id, |
|
"input_ids": input_ids, |
|
"attention_mask": attention_mask, |
|
"token_type_ids": token_type_ids, |
|
"label": label |
|
|
|
} |
|
output = model(**example_encoded) |
|
``` |
|
|
|
|
|
## Training data |
|
|
|
the Training data was the same as proposed [here](https://leaderboard.allenai.org/arc/submission/blcotvl7rrltlue6bsv0) |
|
|
|
The only diferrence was the hypeparameters of RACE fine tuned model, which were reported [here](https://huggingface.co/LIAMF-USP/roberta-large-finetuned-race#eval-results) |
|
|
|
## Training procedure |
|
|
|
It was necessary to preprocess the data with a method that is exemplified for a single instance in the _How to use_ section. The used hyperparameters were the following: |
|
|
|
| Hyperparameter | Value | |
|
|:----:|:----:| |
|
| adam_beta1 | 0.9 | |
|
| adam_beta2 | 0.98 | |
|
| adam_epsilon | 1.000e-8 | |
|
| eval_batch_size | 16 | |
|
| train_batch_size | 4 | |
|
| fp16 | True | |
|
| gradient_accumulation_steps | 4 | |
|
| learning_rate | 0.00001 | |
|
| warmup_steps | 0.06 | |
|
| max_length | 256 | |
|
| epochs | 4 | |
|
|
|
The other parameters were the default ones from [Trainer](https://huggingface.co/transformers/main_classes/trainer.html) and [Trainer Arguments](https://huggingface.co/transformers/main_classes/trainer.html#trainingarguments) |
|
|
|
## Eval results: |
|
| Dataset Acc | Challenge Test | |
|
|:----:|:----:| |
|
| | 65.358 | |
|
|
|
**The model was trained with a TITAN RTX** |
|
|