--- license: bsd tags: - chemistry - biology - protein - antibodies - antibody - heavy chain - AbLang - CDR - OAS --- # AbLang model for heavy chains This is a huggingface version of AbLang: A language model for antibodies. It was introduced in [this paper](https://doi.org/10.1101/2022.01.20.477061) and first released in [this repository](https://github.com/oxpig/AbLang). This model is trained on uppercase amino acids: it only works with capital letter amino acids. # Intended uses & limitations The model could be used for protein feature extraction or to be fine-tuned on downstream tasks (TBA). ### How to use Since this is a custom model, you need to install additional dependencies: ```python pip install ablang ``` Here is how to use this model to get the features of a given antibody sequence in PyTorch: ```python from transformers import AutoModel, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained('qilowoq/AbLang_heavy') model = AutoModel.from_pretrained('qilowoq/AbLang_heavy', trust_remote_code=True) sequence_Example = ' '.join("QIHLVQSGTEVKKPGSSVTVSCKAYGVNTFGLYAVNWVRQAPGQSLEYIGQIWRWKSSASHHFRGRVLISAVDLTGSSPPISSLEIKNLTSDDTAVYFCTTTSTYDKWSGLHHDGVMAFSSWGQGTLISVSAASTKGPSVFPLAPSSGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSSGLYSLSSVVTVPSTQTYICNVNHKPSNTKVDKKVEPK") encoded_input = tokenizer(sequence_Example, return_tensors='pt') model_output = model(encoded_input) ``` Sequence embeddings can be produced as follows: ```python seq_embs = model_output.last_hidden_state[:, 0, :] ``` ### Citation ``` @article{Olsen2022, title={AbLang: An antibody language model for completing antibody sequences}, author={Tobias H. Olsen, Iain H. Moal and Charlotte M. Deane}, journal={bioRxiv}, doi={https://doi.org/10.1101/2022.01.20.477061}, year={2022} } ```