metadata

license: bsd
tags:
  - chemistry
  - biology
  - protein
  - antibodies
  - antibody
  - heavy chain
  - AbLang
  - CDR
  - OAS

AbLang model for heavy chains

This is a huggingface version of AbLang: A language model for antibodies. It was introduced in this paper and first released in this repository. This model is trained on uppercase amino acids: it only works with capital letter amino acids.

Intended uses & limitations

The model could be used for protein feature extraction or to be fine-tuned on downstream tasks (TBA).

How to use

Since this is a custom model, you need to install additional dependencies:

pip install ablang

Here is how to use this model to get the features of a given antibody sequence in PyTorch:

from transformers import AutoModel, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained('qilowoq/AbLang_heavy')
model = AutoModel.from_pretrained('qilowoq/AbLang_heavy', trust_remote_code=True)

sequence_Example = ' '.join("QIHLVQSGTEVKKPGSSVTVSCKAYGVNTFGLYAVNWVRQAPGQSLEYIGQIWRWKSSASHHFRGRVLISAVDLTGSSPPISSLEIKNLTSDDTAVYFCTTTSTYDKWSGLHHDGVMAFSSWGQGTLISVSAASTKGPSVFPLAPSSGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSSGLYSLSSVVTVPSTQTYICNVNHKPSNTKVDKKVEPK")
encoded_input = tokenizer(sequence_Example, return_tensors='pt')
model_output = model(encoded_input)

Sequence embeddings can be produced as follows:

seq_embs = model_output.last_hidden_state[:, 0, :]

Citation

@article{Olsen2022,
  title={AbLang: An antibody language model for completing antibody sequences},
  author={Tobias H. Olsen, Iain H. Moal and Charlotte M. Deane},
  journal={bioRxiv},
  doi={https://doi.org/10.1101/2022.01.20.477061},
  year={2022}
}