fabcon-medium / README.md
ideasbyjin's picture
Update README.md
e83197b verified
metadata
extra_gated_heading: You need to share contact information with Alchemab to access this model
extra_gated_prompt: >-

  ### FAbCon Terms of Use

  FAbCon models follow a [modified Apache 2.0
  license](https://huggingface.co/alchemab/fabcon-large/blob/main/LICENSE.md)
extra_gated_fields:
  First Name: text
  Last Name: text
  Email: text
  Organization: text
  By clicking 'Submit' below, I accept the terms of the license, agree to share contact information with Alchemab: checkbox
  I agree to being contacted about future products, services, and/or partnership opportunities: checkbox
extra_gated_description: >-
  The information you provide will be collected, stored, processed, and shared
  in accordance with the [Alchemab Privacy
  Notice](https://www.alchemab.com/privacy-policy/).
extra_gated_button_content: Submit
license: other
widget:
  - text: ḢQVQLE
tags:
  - biology

FAbCon-medium 🦅🧬

FAbCon is a generative, antibody-specific language model based on the Falcon model. It is pre-trained using causal language modelling, and is suitable for a range of tasks. FAbCon-small, FAbCon-medium, and FAbCon-large are available for non-commercial use via a modified Apache 2.0 license. For any users seeking commercial use of our models (and license for generated antibodies from all FAbCon models), please contact us.

Model variant Parameters Config License
FAbCon-small 144M 24L, 12H, 768d Modified Apache 2.0
FAbCon-medium 297M 28L, 16H, 1024d Modified Apache 2.0
FAbCon-large 2.4B 56L, 32H, 2048d Modified Apache 2.0

Usage example - generation

Generating sequences can be done using HuggingFace's built-in model.generate method,

from transformers import (
    PreTrainedTokenizerFast,
    FalconForCausalLM
)

>>> tokenizer = PreTrainedTokenizerFast.from_pretrained("alchemab/fabcon-medium")
>>> model = FalconForCausalLM.from_pretrained("alchemab/fabcon-medium")
>>> o = model.generate(
            tokenizer("Ḣ", return_tensors='pt')['input_ids'][:, :-1],
            max_new_tokens=...,
            top_k = ...,
            temperature = ...
    )
>>> decoded_seq = tokenizer.batch_decode(o)

Usage example - sequence property prediction

Use the transformers built-in SequenceClassification classes

from transformers import (
    PreTrainedTokenizerFast,
    FalconForSequenceClassification
)

>>> tokenizer = PreTrainedTokenizerFast.from_pretrained("alchemab/fabcon-medium")
>>> model = FalconForSequenceClassification.from_pretrained("alchemab/fabcon-medium")
>>> o = model(input_ids=tokenizer("Ḣ", return_tensors='pt')['input_ids'],
              attention_mask=tokenizer("Ḣ", return_tensors='pt')['attention_mask'])