wabu
/

AmpGPT2

Generated from Trainer

Model card Files Files and versions Community

AmpGPT2 / README.md

wabu's picture

Update README.md

ede8220 verified about 1 month ago

|

2.17 kB

	---
	license: apache-2.0
	base_model: nferruz/ProtGPT2
	tags:
	- generated_from_trainer
	metrics:
	- accuracy
	model-index:
	- name: AmpGPT2
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# AmpGPT2

	AmpGPT2 is a language model capable of generating de novo antimicrobial peptides (AMPs). Generated sequences are predicted to be AMPs 95.83% of the time.

	## Model description

	AmpGPT2 is a fine-tuned version of [nferruz/ProtGPT2](https://huggingface.co/nferruz/ProtGPT2) based on the GPT2 Transformer architecture.
	To validate the results the Antimicrobial Peptide Scanner vr.2 (https://www.dveltri.com/ascan/v2/ascan.html) was used. It is a

	## Training and evaluation data

	AmpGPT2 was trained using 32014 AMP sequences from the Compass (https://compass.mathematik.uni-marburg.de/) database.

	## How to use AmpGPT2
	```
	from transformers import pipeline from transformers import GPT2LMHeadModel, GPT2Tokenizer ampgpt2 = pipeline('text-generation', model="wabu/AmpGPT2") model_amp = GPT2LMHeadModel.from_pretrained('wabu/AmpGPT2') tokenizer_amp = GPT2Tokenizer.from_pretrained('wabu/AmpGPT2') amp_sequences = ampgpt2( "", do_sample=True, repetition_penalty=1.2, num_return_sequences=10, eos_token_id=0 ) for i, seq in enumerate(amp_sequences): sequence_identifier = f"Sequence_{i + 1}" sequence = seq['generated_text'].replace('','').strip() print(f">{sequence_identifier}\n{sequence}")
	```
	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-05
	- train_batch_size: 32
	- eval_batch_size: 32
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 50.0

	### Training results
	these are the training losses after the final epoch

	\| Training Loss \| Epoch \| Validation Loss \| Accuracy \|
	\|:-------------:\|:-----:\|:---------------:\|:--------:\|
	\| 3.7948 \| 50.0 \| 3.9890 \| 0.4213 \|


	### Framework versions

	- Transformers 4.38.0.dev0
	- Pytorch 2.2.0+cu121
	- Datasets 2.16.1
	- Tokenizers 0.15.0