wabu commited on
Commit
f8c84db
1 Parent(s): 428c239

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -12
README.md CHANGED
@@ -18,9 +18,11 @@ AmpGPT2 is a language model capable of generating de novo antimicrobial peptides
18
 
19
  AmpGPT2 is a fine-tuned version of [nferruz/ProtGPT2](https://huggingface.co/nferruz/ProtGPT2) based on the GPT2 Transformer architecture.
20
 
21
- To validate the results the Antimicrobial Peptide Scanner vr.2 (https://www.dveltri.com/ascan/v2/ascan.html) was used.
22
- It is a deep learning tool specifically designed for AMP recognition.
 
23
 
 
24
 
25
  ## Training and evaluation data
26
 
@@ -30,7 +32,7 @@ AmpGPT2 was trained using 32014 AMP sequences from the Compass (https://compass.
30
 
31
  The example code below contains the ideal generation settings found while testing.
32
  The 'num_return_sequences' parameter specifies the amount of sequences generated. When generating more than 100 sequences at the same time, I recommend doing it in batches.
33
- The results can then be checked with the peptide scanner (https://www.dveltri.com/ascan/v2/ascan.html).
34
  ```
35
  from transformers import pipeline
36
  from transformers import GPT2LMHeadModel, GPT2Tokenizer
@@ -49,7 +51,7 @@ for i, seq in enumerate(amp_sequences):
49
  print(f">{sequence_identifier}\n{sequence}")
50
  ```
51
 
52
- ### Training hyperparameters
53
 
54
  The following hyperparameters were used during training:
55
  - learning_rate: 1e-05
@@ -60,14 +62,24 @@ The following hyperparameters were used during training:
60
  - lr_scheduler_type: linear
61
  - num_epochs: 50.0
62
 
63
- The model was trained on four NVIDIA A100 GPUs.
64
-
65
- ### Training results
66
-
67
- | Training Loss | Epoch | Validation Loss | Accuracy |
68
- |:-------------:|:-----:|:---------------:|:--------:|
69
- | 3.7948 | 50.0 | 3.9890 | 0.4213 |
70
-
 
 
 
 
 
 
 
 
 
 
71
 
72
  ### Framework versions
73
 
@@ -75,3 +87,5 @@ The model was trained on four NVIDIA A100 GPUs.
75
  - Pytorch 2.2.0+cu121
76
  - Datasets 2.16.1
77
  - Tokenizers 0.15.0
 
 
 
18
 
19
  AmpGPT2 is a fine-tuned version of [nferruz/ProtGPT2](https://huggingface.co/nferruz/ProtGPT2) based on the GPT2 Transformer architecture.
20
 
21
+ | Training Loss | Epoch | Validation Loss | Accuracy |
22
+ |:-------------:|:-----:|:---------------:|:--------:|
23
+ | 3.7948 | 50.0 | 3.9890 | 0.4213 |
24
 
25
+ To validate the results the Antimicrobial Peptide Scanner vr.2 (https://www.dveltri.com/ascan/v2/ascan.html) was used, which is a deep learning tool specifically designed for AMP recognition.
26
 
27
  ## Training and evaluation data
28
 
 
32
 
33
  The example code below contains the ideal generation settings found while testing.
34
  The 'num_return_sequences' parameter specifies the amount of sequences generated. When generating more than 100 sequences at the same time, I recommend doing it in batches.
35
+ The results can then be checked with the peptide scanner.
36
  ```
37
  from transformers import pipeline
38
  from transformers import GPT2LMHeadModel, GPT2Tokenizer
 
51
  print(f">{sequence_identifier}\n{sequence}")
52
  ```
53
 
54
+ ### Training hyperparameters and results
55
 
56
  The following hyperparameters were used during training:
57
  - learning_rate: 1e-05
 
62
  - lr_scheduler_type: linear
63
  - num_epochs: 50.0
64
 
65
+ \begin{table}[h!]
66
+ \centering
67
+ \caption{AMP Yield Comparison between AmpGPT2 and ProtGPT2}
68
+ \begin{tabular}{lccc}
69
+ \toprule
70
+ Model & Total Sequences & AMP Classified & AMP Percentage (AMP\%) \\
71
+ \midrule
72
+ AmpGPT2 & 10000 & 9541 & 95.41\% \\
73
+ ProtGPT2 & 10000 & 5530 & 55.3\% \\
74
+ \bottomrule
75
+ \end{tabular}
76
+ \label{tab:amp_yield}
77
+ \end{table}
78
+
79
+ | Model | Amp% | Length |
80
+ |:-------:|:-----:|:-------:|
81
+ |AmpGPT2|95.86|64.08 |
82
+ |ProtGPT2| 51.85 | 222.59 |
83
 
84
  ### Framework versions
85
 
 
87
  - Pytorch 2.2.0+cu121
88
  - Datasets 2.16.1
89
  - Tokenizers 0.15.0
90
+
91
+ The model was trained on four NVIDIA A100 GPUs.