File size: 888 Bytes
3d6fae8
 
 
 
 
 
 
 
8507163
 
 
eff432b
 
aaf31a3
4107b0a
eff432b
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
---
datasets:
- adsabs/WIESP2022-NER
language:
- en
tags:
- physics
- computer science
---

PCSciBERT_cased was initiated with the cased variant of SciBERT (https://huggingface.co/allenai/scibert_scivocab_cased) and pre-trained on texts from 1,560,661 research articles of the physics and computer science domain in arXiv.
The tokenizer for PCSciBERT_cased uses the same vocabulary from allenai/scibert_scivocab_cased.

The model was also evaluated on its downstream performance in named entity recognition using the adsabs/WIESP2022-NER and CS-NER (https://github.com/jd-coderepos/contributions-ner-cs/tree/main) dataset. Overall, PCSciBERT_cased achieved higher micro F1 scores than SciBERT(cased) for both WIESP (Micro F1: 82.19%) and CS-NER (Micro F1: 76.22%) datasets.

It improves the performance of SciBERT(cased) on CS-NER test dataset by 0.69% and on WIESP test dataset by 1.49%.