license: other
license_name: hai-def
license_link: https://developers.google.com/health-ai-developer-foundations/terms
language:
- en
tags:
- medical
- pathology
- digital-pathology
- medical-embeddings
extra_gated_heading: Access Path Foundation on Hugging Face
extra_gated_prompt: >-
To access Path Foundation on Hugging Face, you're required to review and agree
to [Health AI Developer Foundation's terms of
use](https://developers.google.com/health-ai-developer-foundations/terms). To
do this, please ensure you’re logged in to Hugging Face and click below.
Requests are processed immediately.
extra_gated_button_content: Acknowledge license
Path Foundation model card
Model documentation: Path Foundation
Resources:
- Model on Google Cloud Model Garden: Path Foundation
- Model on Hugging Face: google/path-foundation
- GitHub repository (supporting code, Colab notebooks, discussions, and issues): path-foundation
- Quick start notebook: notebooks/quick_start
- Support: See Contact.
Terms of use: Health AI Developer Foundations terms of use
Author: Google
Model information
This section describes the Path Foundation model and how to use it.
Description
Path Foundation is a machine learning model for use in histopathology applications. It produces embeddings that can be used to efficiently train classifier models for pathology analysis tasks on hematoxylin and eosin (H&E) patches from whole slide images (WSI) with less data and less compute. Path Foundation is trained using self-supervised learning in order to create embeddings from 224 x 224 pixel image patches from histopathology WSIs. The embeddings returned by the Path Foundation are 384 dimensional vectors of floating point values that represent a projection of the original image into a compressed feature space.
You can read more about the research and underlying model in our manuscript, Domain-specific optimization and diverse evaluation of self-supervised models for histopathology.
How to use
Following are some example code snippets to help you quickly get started running the model locally. If you want to use the model at scale, we recommend that you create a production version using Model Garden.
from PIL import Image as PILImage
from huggingface_hub import hf_hub_download, from_pretrained_keras
import tensorflow as tf
import numpy as np
# Download a test image from Hugging Face Hub
hf_hub_download(repo_id="google/path-foundation", filename='Test.png', local_dir='.')
# Open the image, crop it to match expected input size.
img = PILImage.open("Test.png").crop((0, 0, 224, 224)).convert('RGB')
# Convert the image to a Tensor and scale to [0, 1] (in case needed)
tensor = tf.cast(tf.expand_dims(np.array(img), axis=0), tf.float32) / 255.0
# Load the model directly from Hugging Face Hub
loaded_model = from_pretrained_keras("google/path-foundation")
# Call inference
infer = loaded_model.signatures["serving_default"]
embeddings = infer(tf.constant(tensor))
# Extract the embedding vector
embedding_vector = embeddings['output_0'].numpy().flatten()
Examples
See the following Colab notebooks for examples of how to use Path Foundation:
To give the model a quick try, running it locally with weights from Hugging Face, see Quick start notebook in Colab.
For an example of how to use the model to train a linear classifier using data from Google Cloud DICOM Store, see DICOM linear classifier notebook in Colab.
For an example of how to use the model to train a linear classifier using data from Google Cloud Storage (GCS), see GCS endpoint linear classifier notebook in Colab.
Model architecture overview
Path Foundation uses the ViT-S architecture and was trained using Masked Siamese Networks across magnifications with domain-specific tuning and optimization. The resulting feature representations provided by the model offer robust input for downstream tasks in histopathology. Additional information can be found in the preprint Domain-specific optimization and diverse evaluation of self-supervised models for histopathology.
Technical specifications
- Model type: ViT-S architecture
- Manuscript: Domain-specific optimization and diverse evaluation of self-supervised models for histopathology
- Model created: 2023-12-19
- Model version: Version: 1.0.0
Performance and validation
Linear probe evaluation was conducted across a diverse set of 11 benchmark tasks involving 17 unique tissue types and spanning different optimal magnifications and task types. See the manuscript for more details, including additional results for slide-level tasks (e.g., tissue type classification and molecular findings) and fine tuning with data titration.
Key performance metrics
- 93% - A Linear Probing AUC for a suite of histopathology classification tasks. 95% CI: [92.9 - 93.8]
Inputs and outputs
Input: Image patch of 224 x 224 pixels from H&E Whole Slide Images (WSIs).
Path Foundation is closely integrated with EZ-WSI, a library for digital pathology that lets you process WSIs to patches and send them to the model.
Output: Embedding vector of floating point values (Dimensions: 384).
Dataset details
Training dataset
Training data consisted of hematoxylin and eosin stained (H&E) WSIs from The Cancer Genome Atlas (TCGA), accessed at https://portal.gdc.cancer.gov. Training was performed using 60 million patches across three magnifications (~2 µm/pixel, ~1 µm/pixel, ~0.5 µm/pixel) and across the 32 solid tumor TCGA studies (representing different cancer types and with training data including both tumor and diverse, non-tumor patches).
Labeling
Model was trained using self-supervised learning, meaning no supervised labels were used. Labels used to measure model performance on downstream tasks were provided either through pathologist annotation or slide-level metadata.
Additional information about data and labels used for downstream tasks can be found in the following references:
- Benjordi, B. et al. Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer. JAMA (2017).
- Jaroensri, R. et al. Deep learning models for histologic grading of breast cancer and association with disease prognosis. npj Breast Cancer 8, 1–12 (2022).
- Liu, Y. et al. Artificial Intelligence-Based Breast Cancer Nodal Metastasis Detection: Insights Into the Black Box for Pathologists. Arch. Pathol. Lab. Med. 143, (2019).
- Lai, J. et al. Domain-specific optimization and diverse evaluation of self-supervised models for histopathology. arXiv (2023).
- Nagpal, K. et al. Development and Validation of a Deep Learning Algorithm for Gleason Grading of Prostate Cancer From Biopsy Specimens. JAMA Oncol 6, 1372–1380 (2020).
- Nagpal, K. et al. Development and validation of a deep learning algorithm for improving Gleason scoring of prostate cancer. npj Digital Medicine 2, 1–10 (2019).
- Sadhwani, A. et al. Comparative analysis of machine learning approaches to classify tumor mutation burden in lung adenocarcinoma using histopathology images. Sci. Rep. 11, 1–11 (2021).
- Wulczyn, E. et al. Interpretable survival prediction for colorectal cancer using deep learning. NPJ Digital Medicine 4, (2021).
- Weng, WH. et al. Multimodal Multitask Representation Learning for Pathology Biobank Metadata Prediction. arXiv (2019).
License
The use of Path Foundations is governed by the Health AI Developer Foundations terms of use.
Data citation
The results of Path Foundation are in whole or in part based upon data generated by the TCGA Research Network.
Implementation information
This section provides details about the model internals.
Software
Training was done using JAX. JAX allows researchers to take advantage of the latest generation of hardware, including TPUs, for faster and more efficient training of large models.
Use and limitations
Intended use
Path Foundation can reduce the training data, compute, and technical expertise necessary to develop task-specific models for H&E pathology slides.
Embeddings from the model can be used for a variety of user-defined downstream tasks including, but not limited to: cancer detection, classification, and grading; metadata prediction (stain, tissue type, specimen type, etc.); quality assessment (e.g., imaging artifacts); and similar image search.
The embeddings can also be used to explore the feature space of histopathology images for biomarker development associated with prognostic and predictive tasks.
Benefits
Path Foundation Embeddings can be used for efficient training of AI development for H&E histopathology image analysis with significantly less data and compute than traditional methods.
By leveraging the large set of pre-trained images Path Foundation is trained on, users need less data but can also build more generalizable models than training on more limited datasets.
Provides a rich, compressed representation of histopathology image patches.
Helps users build AI classifiers for a variety of different applications with less data and with less compute.
Limitations
Below are the number of known factors that may degrade model performance or decrease confidence in the model results:
The model has only been validated on a limited number of the many potential downstream tasks involving H&E histopathology.
This model version was trained and validated only on H&E images from a limited set of scanners and countries.
Model output may not generalize well to data from other image types, patient populations, or scanner manufacturers not used in training.
Task-specific validation remains an important aspect of downstream model development by the end user.
Training and validation was performed on patches corresponding to 5x, 10x, and 20x magnification (~2 µm/pixel, ~1 µm/pixel, and ~0.5 µm/pixel, respectively). Using input patches corresponding to magnifications other than these has not been evaluated.
The model is only used to generate embeddings of user-provided data. It does not generate any predictions or diagnosis on its own.
As with any research, developers should ensure that any downstream application is validated to understand performance using data that is appropriately representative of the intended use setting for the specific application (e.g., age, sex, gender, condition, scanner, etc.).