Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,103 @@
|
|
1 |
-
---
|
2 |
-
license: gpl-3.0
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: gpl-3.0
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
metrics:
|
6 |
+
- accuracy
|
7 |
+
base_model: dmis-lab/ANGEL_pretrained
|
8 |
+
---
|
9 |
+
|
10 |
+
# Model Card for ANGEL_bc5cdr
|
11 |
+
This model card provides detailed information about the ANGEL_bc5cdr model, designed for biomedical entity linking.
|
12 |
+
|
13 |
+
|
14 |
+
# Model Details
|
15 |
+
|
16 |
+
#### Model Description
|
17 |
+
- **Developed by:** Chanhwi Kim, Hyunjae Kim, Sihyeon Park, Jiwoo Lee, Mujeen Sung, Jaewoo Kang
|
18 |
+
- **Model type:** Generative Biomedical Entity Linking Model
|
19 |
+
- **Language(s):** English
|
20 |
+
- **License:** GPL-3.0
|
21 |
+
- **Finetuned from model:** BART-large (Base architecture)
|
22 |
+
|
23 |
+
#### Model Sources
|
24 |
+
|
25 |
+
- **Github Repository:** https://github.com/dmis-lab/ANGEL
|
26 |
+
- **Paper:** https://arxiv.org/pdf/2408.16493
|
27 |
+
|
28 |
+
|
29 |
+
# Direct Use
|
30 |
+
ANGEL_bc5cdr is a tool specifically designed for biomedical entity linking, with a focus on identifying and linking disease mentions within BC5CDR datasets.
|
31 |
+
To use this model, you need to set up a virtual environment and the inference code.
|
32 |
+
Start by cloning our [ANGEL GitHub repository](https://github.com/dmis-lab/ANGEL).
|
33 |
+
Then, run the following script to set up the environment:
|
34 |
+
```bash
|
35 |
+
bash script/environment/set_environment.sh
|
36 |
+
```
|
37 |
+
|
38 |
+
Then, if you want to run the model on a single sample, no preprocessing is required.
|
39 |
+
Simply execute the run_sample.sh script:
|
40 |
+
|
41 |
+
```bash
|
42 |
+
bash script/inference/run_sample.sh bc5cdr
|
43 |
+
```
|
44 |
+
|
45 |
+
To modify the sample with your own example, refer to the [Direct Use](https://github.com/dmis-lab/ANGEL?tab=readme-ov-file#direct-use) section in our GitHub repository.
|
46 |
+
If you're interested in training or evaluating the model, check out the [Fine-tuning](https://github.com/dmis-lab/ANGEL?tab=readme-ov-file#fine-tuning) section and [Evaluation](https://github.com/dmis-lab/ANGEL?tab=readme-ov-file#evaluation) section.
|
47 |
+
# Training
|
48 |
+
|
49 |
+
#### Training Data
|
50 |
+
The model was trained on the BC5CDR dataset, which includes annotated disease entities.
|
51 |
+
|
52 |
+
#### Training Procedure
|
53 |
+
Positive-only Pre-training: Initial training using only positive examples, following the standard approach.
|
54 |
+
Negative-aware Training: Subsequent training incorporated negative examples to improve the model's discriminative capabilities.
|
55 |
+
|
56 |
+
# Evaluation
|
57 |
+
|
58 |
+
### Testing Data
|
59 |
+
The model was evaluated using BC5CDR dataset.
|
60 |
+
|
61 |
+
### Metrics
|
62 |
+
Accuracy at Top-1 (Acc@1): Measures the percentage of times the model's top prediction matches the correct entity.
|
63 |
+
|
64 |
+
### Scores
|
65 |
+
|
66 |
+
<table border="1" cellspacing="0" cellpadding="5" style="width: 100%; text-align: center; border-collapse: collapse; margin-left: 0;">
|
67 |
+
<thead>
|
68 |
+
<tr>
|
69 |
+
<th><b>Dataset</b></th>
|
70 |
+
<th><b>BioSYN</b><br>(Sung et al., 2020)</th>
|
71 |
+
<th><b>SapBERT</b><br>(Liu et al., 2021)</th>
|
72 |
+
<th><b>GenBioEL</b><br>(Yuan et al., 2022b)</th>
|
73 |
+
<th><b>ANGEL<br>(Ours)</b></th>
|
74 |
+
</tr>
|
75 |
+
</thead>
|
76 |
+
<tbody>
|
77 |
+
<tr>
|
78 |
+
<td><b>BC5CDR</b></td>
|
79 |
+
<td>-</td>
|
80 |
+
<td>-</td>
|
81 |
+
<td>93.1</td>
|
82 |
+
<td><b>94.5</b></td>
|
83 |
+
</tr>
|
84 |
+
</tbody>
|
85 |
+
</table>
|
86 |
+
The scores of GenBioEL were reproduced.
|
87 |
+
|
88 |
+
We excluded the performance of BioSYN and SapBERT, as they were evaluated separately on the chemical and disease subsets, differing from our settings.
|
89 |
+
|
90 |
+
# Citation
|
91 |
+
If you use the ANGEL_bc5cdr model, please cite:
|
92 |
+
|
93 |
+
```bibtex
|
94 |
+
@article{kim2024learning,
|
95 |
+
title={Learning from Negative Samples in Generative Biomedical Entity Linking},
|
96 |
+
author={Kim, Chanhwi and Kim, Hyunjae and Park, Sihyeon and Lee, Jiwoo and Sung, Mujeen and Kang, Jaewoo},
|
97 |
+
journal={arXiv preprint arXiv:2408.16493},
|
98 |
+
year={2024}
|
99 |
+
}
|
100 |
+
```
|
101 |
+
|
102 |
+
# Contact
|
103 |
+
For questions or issues, please contact chanhwi_kim@korea.ac.kr.
|