Update README.md
Browse files
README.md
CHANGED
@@ -18,9 +18,11 @@ AmpGPT2 is a language model capable of generating de novo antimicrobial peptides
|
|
18 |
|
19 |
AmpGPT2 is a fine-tuned version of [nferruz/ProtGPT2](https://huggingface.co/nferruz/ProtGPT2) based on the GPT2 Transformer architecture.
|
20 |
|
21 |
-
|
22 |
-
|
|
|
23 |
|
|
|
24 |
|
25 |
## Training and evaluation data
|
26 |
|
@@ -30,7 +32,7 @@ AmpGPT2 was trained using 32014 AMP sequences from the Compass (https://compass.
|
|
30 |
|
31 |
The example code below contains the ideal generation settings found while testing.
|
32 |
The 'num_return_sequences' parameter specifies the amount of sequences generated. When generating more than 100 sequences at the same time, I recommend doing it in batches.
|
33 |
-
The results can then be checked with the peptide scanner
|
34 |
```
|
35 |
from transformers import pipeline
|
36 |
from transformers import GPT2LMHeadModel, GPT2Tokenizer
|
@@ -49,7 +51,7 @@ for i, seq in enumerate(amp_sequences):
|
|
49 |
print(f">{sequence_identifier}\n{sequence}")
|
50 |
```
|
51 |
|
52 |
-
### Training hyperparameters
|
53 |
|
54 |
The following hyperparameters were used during training:
|
55 |
- learning_rate: 1e-05
|
@@ -60,14 +62,24 @@ The following hyperparameters were used during training:
|
|
60 |
- lr_scheduler_type: linear
|
61 |
- num_epochs: 50.0
|
62 |
|
63 |
-
|
64 |
-
|
65 |
-
|
66 |
-
|
67 |
-
|
68 |
-
|
69 |
-
|
70 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
71 |
|
72 |
### Framework versions
|
73 |
|
@@ -75,3 +87,5 @@ The model was trained on four NVIDIA A100 GPUs.
|
|
75 |
- Pytorch 2.2.0+cu121
|
76 |
- Datasets 2.16.1
|
77 |
- Tokenizers 0.15.0
|
|
|
|
|
|
18 |
|
19 |
AmpGPT2 is a fine-tuned version of [nferruz/ProtGPT2](https://huggingface.co/nferruz/ProtGPT2) based on the GPT2 Transformer architecture.
|
20 |
|
21 |
+
| Training Loss | Epoch | Validation Loss | Accuracy |
|
22 |
+
|:-------------:|:-----:|:---------------:|:--------:|
|
23 |
+
| 3.7948 | 50.0 | 3.9890 | 0.4213 |
|
24 |
|
25 |
+
To validate the results the Antimicrobial Peptide Scanner vr.2 (https://www.dveltri.com/ascan/v2/ascan.html) was used, which is a deep learning tool specifically designed for AMP recognition.
|
26 |
|
27 |
## Training and evaluation data
|
28 |
|
|
|
32 |
|
33 |
The example code below contains the ideal generation settings found while testing.
|
34 |
The 'num_return_sequences' parameter specifies the amount of sequences generated. When generating more than 100 sequences at the same time, I recommend doing it in batches.
|
35 |
+
The results can then be checked with the peptide scanner.
|
36 |
```
|
37 |
from transformers import pipeline
|
38 |
from transformers import GPT2LMHeadModel, GPT2Tokenizer
|
|
|
51 |
print(f">{sequence_identifier}\n{sequence}")
|
52 |
```
|
53 |
|
54 |
+
### Training hyperparameters and results
|
55 |
|
56 |
The following hyperparameters were used during training:
|
57 |
- learning_rate: 1e-05
|
|
|
62 |
- lr_scheduler_type: linear
|
63 |
- num_epochs: 50.0
|
64 |
|
65 |
+
\begin{table}[h!]
|
66 |
+
\centering
|
67 |
+
\caption{AMP Yield Comparison between AmpGPT2 and ProtGPT2}
|
68 |
+
\begin{tabular}{lccc}
|
69 |
+
\toprule
|
70 |
+
Model & Total Sequences & AMP Classified & AMP Percentage (AMP\%) \\
|
71 |
+
\midrule
|
72 |
+
AmpGPT2 & 10000 & 9541 & 95.41\% \\
|
73 |
+
ProtGPT2 & 10000 & 5530 & 55.3\% \\
|
74 |
+
\bottomrule
|
75 |
+
\end{tabular}
|
76 |
+
\label{tab:amp_yield}
|
77 |
+
\end{table}
|
78 |
+
|
79 |
+
| Model | Amp% | Length |
|
80 |
+
|:-------:|:-----:|:-------:|
|
81 |
+
|AmpGPT2|95.86|64.08 |
|
82 |
+
|ProtGPT2| 51.85 | 222.59 |
|
83 |
|
84 |
### Framework versions
|
85 |
|
|
|
87 |
- Pytorch 2.2.0+cu121
|
88 |
- Datasets 2.16.1
|
89 |
- Tokenizers 0.15.0
|
90 |
+
|
91 |
+
The model was trained on four NVIDIA A100 GPUs.
|