YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

StructBERT: Un-Official Copy

Official Repository Link: https://github.com/alibaba/AliceMind/tree/main/StructBERT

Claimer

This model card is not produced by AliceMind Team

Reproduce HFHub models:

Download model/tokenizer vocab

wget https://raw.githubusercontent.com/alibaba/AliceMind/main/StructBERT/config/large_bert_config.json && mv large_bert_config.json config.json
wget https://raw.githubusercontent.com/alibaba/AliceMind/main/StructBERT/config/vocab.txt
wget https://alice-open.oss-cn-zhangjiakou.aliyuncs.com/StructBERT/en_model && mv en_model pytorch_model.bin

from transformers import AutoConfig, AutoModelForMaskedLM, AutoTokenizer

config = AutoConfig.from_pretrained("./config.json")
model = AutoModelForMaskedLM.from_pretrained(".", config=config)
tokenizer = AutoTokenizer.from_pretrained(".", config=config)

model.push_to_hub("structbert-large")
tokenizer.push_to_hub("structbert-large")

https://arxiv.org/abs/1908.04577

StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding

Introduction

We extend BERT to a new model, StructBERT, by incorporating language structures into pre-training. Specifically, we pre-train StructBERT with two auxiliary tasks to make the most of the sequential order of words and sentences, which leverage language structures at the word and sentence levels, respectively.

Pre-trained models

Model	Description	#params	Download
structbert.en.large	StructBERT using the BERT-large architecture	340M	structbert.en.large
structroberta.en.large	StructRoBERTa continue training from RoBERTa	355M	Coming soon
structbert.ch.large	Chinese StructBERT; BERT-large architecture	330M	structbert.ch.large

Results

The results of GLUE & CLUE tasks can be reproduced using the hyperparameters listed in the following "Example usage" section.

structbert.en.large

GLUE benchmark

Model	MNLI	QNLIv2	QQP	SST-2	MRPC
structbert.en.large	86.86%	93.04%	91.67%	93.23%	86.51%

structbert.ch.large

CLUE benchmark

Model	CMNLI	OCNLI	TNEWS	AFQMC
structbert.ch.large	84.47%	81.28%	68.67%	76.11%

Example usage

Requirements and Installation

PyTorch version >= 1.0.1
Install other libraries via

pip install -r requirements.txt

For faster training install NVIDIA's apex library

Finetune MNLI

python run_classifier_multi_task.py \
  --task_name MNLI \
  --do_train \
  --do_eval \
  --do_test \
  --amp_type O1 \
  --lr_decay_factor 1 \
  --dropout 0.1 \
  --do_lower_case \
  --detach_index -1 \
  --core_encoder bert \
  --data_dir path_to_glue_data \
  --vocab_file config/vocab.txt \
  --bert_config_file config/large_bert_config.json \
  --init_checkpoint path_to_pretrained_model \
  --max_seq_length 128 \
  --train_batch_size 32 \
  --learning_rate 2e-5 \
  --num_train_epochs 3 \
  --fast_train \
  --gradient_accumulation_steps 1 \
  --output_dir path_to_output_dir

Citation

If you use our work, please cite:

@article{wang2019structbert,
  title={Structbert: Incorporating language structures into pre-training for deep language understanding},
  author={Wang, Wei and Bi, Bin and Yan, Ming and Wu, Chen and Bao, Zuyi and Xia, Jiangnan and Peng, Liwei and Si, Luo},
  journal={arXiv preprint arXiv:1908.04577},
  year={2019}
}

Downloads last month: 11