|
--- |
|
license: cc-by-nc-nd-4.0 |
|
language: |
|
- en |
|
base_model: EleutherAI/pythia-410m |
|
library_name: transformers |
|
tags: |
|
- biology |
|
- scRNAseq |
|
--- |
|
|
|
# Overview |
|
This is the C2S-Pythia-410m-cell-type-conditioned-cell-generation model, built on the Pythia-410m architecture developed |
|
by EleutherAI, fine-tuned using Cell2Sentence (C2S) on a comprehensive collection of single-cell RNA sequencing |
|
(scRNA-seq) datasets from CellxGene and the Human Cell Atlas. Cell2Sentence is a pioneering technique that adapts |
|
large language models (LLMs) to single-cell biology by converting scRNA-seq data into "cell sentences" — ordered |
|
sequences of gene names based on expression levels. This model is specifically trained for cell type-conditioned |
|
single-cell generation, enabling the generation of realistic single-cell profiles conditioned on specified cell |
|
types. |
|
|
|
# Training Data |
|
This model was trained on over 57 million human and mouse cells gathered from over 800 single-cell RNA sequencing |
|
datasets from CellxGene and the Human Cell Atlas. This dataset covers a broad range of cell types and conditions |
|
from multiple tissues in both human and mouse. |
|
|
|
This model was trained with the top 200 genes per cell sentence. |
|
|
|
# Tasks |
|
This model is designed for: |
|
|
|
- Cell type-conditioned single-cell generation: Generating single-cell profiles conditioned on specific cell types, allowing for the creation of synthetic cells that reflect the gene expression patterns of targeted cell types. |
|
|
|
|
|
# Cell2Sentence Links |
|
- GitHub: https://github.com/vandijklab/cell2sentence |
|
- Paper: https://www.biorxiv.org/content/10.1101/2023.09.11.557287v3 |
|
|
|
# Pythia Links |
|
- Paper: https://arxiv.org/pdf/2304.01373 |
|
- Hugging Face: https://huggingface.co/EleutherAI/pythia-410m |