Teja-Gollapudi commited on
Commit
c51d22f
1 Parent(s): 1477167

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +78 -3
README.md CHANGED
@@ -1,3 +1,78 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ # tinyroberta-mrqa
3
+
4
+ This is the *distilled* version of the [VMware/roberta-large-mrqa](https://huggingface.co/VMware/roberta-large-mrqa) model. This model has a comparable prediction quality to the base model and runs twice as fast.
5
+
6
+ ## Overview
7
+ **Language model:** tinyroberta-mrqa
8
+ **Language:** English
9
+ **Downstream-task:** Extractive QA
10
+ **Training data:** MRQA
11
+ **Eval data:** MRQA
12
+
13
+ ## Hyperparameters
14
+
15
+ ### Distillation Hyperparameters
16
+ ```
17
+ batch_size = 96
18
+ n_epochs = 4
19
+ base_LM_model = "deepset/tinyroberta-squad2-step1"
20
+ max_seq_len = 384
21
+ learning_rate = 3e-5
22
+ lr_schedule = LinearWarmup
23
+ warmup_proportion = 0.2
24
+ doc_stride = 128
25
+ max_query_length = 64
26
+ distillation_loss_weight = 0.75
27
+ temperature = 1.5
28
+ teacher = "VMware/roberta-large-mrqa"
29
+ ```
30
+ ### Finetunning Hyperparameters
31
+
32
+ We have finetuned on the MRQA training set.
33
+ ```
34
+ learning_rate=1e-5,
35
+ num_train_epochs=3,
36
+ weight_decay=0.01,
37
+ per_device_train_batch_size=16,
38
+ n_gpus = 3
39
+ ```
40
+
41
+ ## Distillation
42
+ This model is inspired by deepset/tinyroberta-squad2.
43
+ We start with a base checkpoint of [deepset/roberta-base-squad2](https://huggingface.co/deepset/roberta-base-squad2) and perform further task prediction layer distillation on [VMware/roberta-large-mrqa](https://huggingface.co/VMware/roberta-large-mrqa).
44
+ We then fine-tune it on MRQA.
45
+
46
+ ## Usage
47
+
48
+ ### In Transformers
49
+ ```python
50
+ from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline
51
+
52
+ model_name = "VMware/tinyroberta-mrqa"
53
+
54
+ # a) Get predictions
55
+ nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)
56
+ QA_input = {
57
+ 'question': '',
58
+ 'context': ''
59
+ }
60
+ res = nlp(QA_input)
61
+
62
+ # b) Load model & tokenizer
63
+ model = AutoModelForQuestionAnswering.from_pretrained(model_name)
64
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
65
+ ```
66
+
67
+ ## Performance
68
+
69
+ We have Evaluated the model on the MRQA dev set and test set using SQUAD metrics.
70
+
71
+ ```
72
+ eval exact match: 69.2
73
+ eval f1 score: 79.6
74
+
75
+ test exact match: 52.8
76
+ test f1 score: 63.4
77
+
78
+ ```