File size: 3,479 Bytes
834574c
 
 
 
 
 
df2ccce
834574c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
---
license: gpl-3.0
language:
- en
metrics:
- accuracy
base_model: dmis-lab/ANGEL_pretrained
---

# Model Card for ANGEL_ncbi
This model card provides detailed information about the ANGEL_ncbi model, designed for biomedical entity linking.


# Model Details

#### Model Description
- **Developed by:** Chanhwi Kim, Hyunjae Kim, Sihyeon Park, Jiwoo Lee, Mujeen Sung, Jaewoo Kang
- **Model type:** Generative Biomedical Entity Linking Model
- **Language(s):** English
- **License:** GPL-3.0
- **Finetuned from model:** BART-large (Base architecture)

#### Model Sources

- **Github Repository:** https://github.com/dmis-lab/ANGEL
- **Paper:** https://arxiv.org/pdf/2408.16493


# Direct Use
ANGEL_ncbi is a tool specifically designed for biomedical entity linking, with a focus on identifying and linking disease mentions within NCBI-disease datasets.
To use this model, you need to set up a virtual environment and the inference code.
Start by cloning our [ANGEL GitHub repository](https://github.com/dmis-lab/ANGEL). 
Then, run the following script to set up the environment:
```bash
bash script/environment/set_environment.sh
```

Then, if you want to run the model on a single sample, no preprocessing is required. 
Simply execute the run_sample.sh script:

```bash
bash script/inference/run_sample.sh ncbi
```

To modify the sample with your own example, refer to the [Direct Use](https://github.com/dmis-lab/ANGEL?tab=readme-ov-file#direct-use) section in our GitHub repository.
If you're interested in training or evaluating the model, check out the [Fine-tuning](https://github.com/dmis-lab/ANGEL?tab=readme-ov-file#fine-tuning) section and [Evaluation](https://github.com/dmis-lab/ANGEL?tab=readme-ov-file#evaluation) section.
# Training

#### Training Data
The model was trained on the NCBI-disease dataset, which includes annotated disease entities.

#### Training Procedure
Positive-only Pre-training: Initial training using only positive examples, following the standard approach.
Negative-aware Training: Subsequent training incorporated negative examples to improve the model's discriminative capabilities.

# Evaluation

### Testing Data
The model was evaluated using NCBI-disease dataset.

### Metrics
Accuracy at Top-1 (Acc@1): Measures the percentage of times the model's top prediction matches the correct entity.

### Scores

<table border="1" cellspacing="0" cellpadding="5" style="width: 100%; text-align: center; border-collapse: collapse; margin-left: 0;">
  <thead>
    <tr>
      <th><b>Dataset</b></th>
      <th><b>BioSYN</b><br>(Sung et al., 2020)</th>
      <th><b>SapBERT</b><br>(Liu et al., 2021)</th>
      <th><b>GenBioEL</b><br>(Yuan et al., 2022b)</th>
      <th><b>ANGEL<br>(Ours)</b></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="width: 20%;"><b>NCBI</b></td>
      <td style="width: 20%;">91.1</td>
      <td style="width: 20%;">92.3</td>
      <td style="width: 20%;">91.0</td>
      <td style="width: 20%;"><b>92.8</b></td>
    </tr>
  </tbody>
</table>

The scores of GenBioEL were reproduced.


# Citation
If you use the ANGEL_ncbi model, please cite:

```bibtex
@article{kim2024learning,
  title={Learning from Negative Samples in Generative Biomedical Entity Linking},
  author={Kim, Chanhwi and Kim, Hyunjae and Park, Sihyeon and Lee, Jiwoo and Sung, Mujeen and Kang, Jaewoo},
  journal={arXiv preprint arXiv:2408.16493},
  year={2024}
}
```

# Contact
For questions or issues, please contact chanhwi_kim@korea.ac.kr.