From Clozing to Comprehending: Retrofitting Pre-trained Masked Language Model to Pre-trained Machine Reader

Pre-trained Machine Reader (PMR) is pre-trained with 18 million Machine Reading Comprehension (MRC) examples constructed with Wikipedia Hyperlinks. It was introduced in the paper From Clozing to Comprehending: Retrofitting Pre-trained Masked Language Model to Pre-trained Machine Reader by Weiwen Xu, Xin Li, Wenxuan Zhang, Meng Zhou, Wai Lam, Luo Si, Lidong Bing and first released in this repository.

This model is initialized with roberta-large and further continued pre-trained with an MRC objective.

Model description

The model is pre-trained with distantly labeled data using a learning objective called Wiki Anchor Extraction (WAE). Specifically, we constructed a large volume of general-purpose and high-quality MRC-style training data based on Wikipedia anchors (i.e., hyperlinked texts). For each Wikipedia anchor, we composed a pair of correlated articles. One side of the pair is the Wikipedia article that contains detailed descriptions of the hyperlinked entity, which we defined as the definition article. The other side of the pair is the article that mentions the specific anchor text, which we defined as the mention article. We composed an MRC-style training instance in which the anchor is the answer, the surrounding passage of the anchor in the mention article is the context, and the definition of the anchor entity in the definition article is the query. Based on the above data, we then introduced a novel WAE problem as the pre-training task of PMR. In this task, PMR determines whether the context and the query are relevant. If so, PMR extracts the answer from the context that satisfies the query description.

During fine-tuning, we unified downstream NLU tasks in our MRC formulation, which typically falls into four categories: (1) span extraction with pre-defined labels (e.g., NER) in which each task label is treated as a query to search the corresponding answers in the input text (context); (2) span extraction with natural questions (e.g., EQA) in which the question is treated as the query for answer extraction from the given passage (context); (3) sequence classification with pre-defined task labels, such as sentiment analysis. Each task label is used as a query for the input text (context); and (4) sequence classification with natural questions on multiple choices, such as multi-choice QA (MCQA). We treated the concatenation of the question and one choice as the query for the given passage (context). Then, in the output space, we tackle span extraction problems by predicting the probability of context span being the answer. We tackle sequence classification problems by conducting relevance classification on [CLS] (extracting [CLS] if relevant).

Model variations

There are three versions of models released. The details are:

Model	Backbone	#params
PMR-base	roberta-base	125M
PMR-large (this checkpoint)	roberta-large	355M
PMR-xxlarge	albert-xxlarge-v2	235M

Intended uses & limitations

The models need to be fine-tuned on the data downstream tasks. During fine-tuning, no task-specific layer is required.

How to use

You can try the codes from this repo.

BibTeX entry and citation info

@article{xu2022clozing,
  title={From Clozing to Comprehending: Retrofitting Pre-trained Language Model to Pre-trained Machine Reader},
  author={Xu, Weiwen and Li, Xin and Zhang, Wenxuan and Zhou, Meng and Bing, Lidong and Lam, Wai and Si, Luo},
  journal={arXiv preprint arXiv:2212.04755},
  year={2022}
}