This model has been optimized using NVIDIA's TransformerEngine library. Slight numerical differences may be observed between the original model and the optimized version. For instructions on how to install TransformerEngine, please refer to the official documentation.

ESM-2 (TransformerEngine-Optimized) Overview

Description:

ESM-2 is a state-of-the-art protein model trained on a masked language modelling objective. It predicts protein structures from amino acid sequences, leveraging a transformer-based architecture for accurate 3D modeling. It is suitable for fine-tuning on a wide range of tasks that take protein sequences as input.

This version of the ESM-2 model is optimized with NVIDIA's TransformerEngine library. It is based on the original ESM-2 model from Facebook Research, and (within numerical precision) has identical weights and outputs.

This model is ready for commercial/non-commercial use.

Third-Party Community Consideration

This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party's requirements for this application and use case; see link to Non-NVIDIA Model Card ESM-2 Model Card.

License/Terms of Use:

ESM-2 is licensed under the MIT license.

Deployment Geography:

Global

Use Case:

Protein structure prediction, specifically predicting 3D protein structures from amino acid sequences.

Release Date:

Hugging Face 07/29/2025 via https://huggingface.co/nvidia/esm2_t48_15B_UR50D

Reference(s):

Evolutionary-scale prediction of atomic level protein structure with a language model - detailed information on the model architecture and training data, please refer to the accompanying [paper].
Demo notebooks (PyTorch, TensorFlow) which demonstrate how to fine-tune ESM-2 models on your tasks of interest.

Model Architecture:

Architecture Type: Transformer Network Architecture: ESM-2

This model was developed based on: ESM-2
Number of model parameters: 1.5 x 10^10

Input:

Input Type: Text (Protein Sequences)
Input Format: String
Input Parameters: One-Dimensional (1D)
Other Properties Related to Input: Protein sequence represented as a string of canonical amino acids, of maximum length 1022. Longer sequences are automatically truncated to this length.

Output:

Output Type: Embeddings (Amino acid and sequence-level)
Output Format: Vector
Output Parameters: One-Dimensional (1D)
Other Properties Related to Output: Numeric vector with floating-point values corresponding to an embedding for each amino acid in the input protein sequence. Maximum output length is 1022 embeddings - one embedding vector per amino acid.

Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.

Software Integration:

Runtime Engine(s):

Hugging Face Transformers

Supported Hardware Microarchitecture Compatibility:

NVIDIA Ampere
NVIDIA Blackwell
NVIDIA Hopper

[Preferred/Supported] Operating System(s):

Linux

The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.

Model Version: This model features the following version/checkpoints:

Several ESM-2 checkpoints are available with varying sizes. Larger sizes have better accuracy, but require more memory and time to train:

Checkpoint name	Num layers	Num parameters
esm2_t48_15B_UR50D	48	15B
esm2_t36_3B_UR50D	36	3B
esm2_t33_650M_UR50D	33	650M
esm2_t30_150M_UR50D	30	150M
esm2_t12_35M_UR50D	12	35M
esm2_t6_8M_UR50D	6	8M

Training and Evaluation Datasets:

Training Datasets:

Link: UniRef90

Data Modality:

Text (Protein Sequences)

Text Training Data Size:

1 Billion to 10 Trillion Tokens

Data Collection Method:

Human

Labeling Method:

Properties (Quantity, Dataset Descriptions, Sensor(s)): UniRef90 clusters are generated from the UniRef100 seed sequences with a 90% sequence identity threshold using the MMseqs2 algorithm. The seed sequences are the longest members of the UniRef100 cluster. However, the longest sequence is not always the most informative. There is often more biologically relevant information and annotation (name, function, cross-references) available on other cluster members. All the proteins in each cluster are ranked to facilitate the selection of a biologically relevant representative for the cluster.

Link: UniRef50

Data Modality:

Text (Protein Sequences)

Text Training Data Size:

1 Billion to 10 Trillion Tokens

Data Collection Method:

Human

Labeling Method:

Properties: UniRef50 clusters are generated from the UniRef90 seed sequences with a 50% sequence identity threshold using the MMseqs2 algorithm. The seed sequences are the longest members of the UniRef90 cluster. However, the longest sequence is not always the most informative. There is often more biologically relevant information and annotation (name, function, cross-references) available on other cluster members. All the proteins in each cluster are ranked to facilitate the selection of a biologically relevant representative for the cluster.

Evaluation Datasets:

Link: Continuous Automated Model Evaluation (CAMEO)

Benchmark Score: 0.72

Data Collection Method:

Human

Labeling Method:

Properties: The data is collected by taking sequences of protein structures that are about to be released weekly by the Protein Data Bank (PDB). These sequences are sent as "blind targets" to participating protein structure prediction servers, which then return their predictions.

Link: CASP14 (Critical Assessment of Methods of Protein Structure Prediction)

Benchmark Score: 0.55

Data Collection Method:

Human

Labeling Method:

Properties: The data for CASP14 targets is collected from protein structures that are newly solved by experimental structural biologists. The CASP organizers receive the amino acid sequences of these proteins before their full, three-dimensional structures are publicly released in the Protein Data Bank (PDB). They then provide these sequences to participating research groups and servers, who must submit their predicted structures within a specific time frame.

Inference:

Acceleration Engine:

Hugging Face Transformers

Test Hardware:

A100
H100
H200
GB200

Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

Users are responsible for ensuring the physical properties of model-generated molecules are appropriately evaluated and comply with applicable safety regulations and ethical standards.

Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.

Downloads last month: 2,002

Safetensors

Model size

15B params

Tensor type

F32

Collection including nvidia/esm2_t48_15B_UR50D

BioNeMo

Collection

Accelerated models for digital biology by the NVIDIA BioNeMo team. https://www.nvidia.com/en-us/clara/biopharma/ • 9 items • Updated about 23 hours ago • 6