cell-cell-BERT

Configuration: R-pretrained

This model includes learned embeddings for special tokens (e.g., [CELL0], [CELL1]), acquired through continued pre-training on biomedical text.

Model Description

This is a specific configuration of the cell-cell-BERT model for extracting cell-cell interactions from biomedical text. It determines whether a sentence describes a direct biological relationship between two target cell types.

For full details, see our paper: "Defining and Evaluating Cell–Cell Relation Extraction from Biomedical Literature under Realistic Annotation Constraints" (bioRxiv, 2025).

Repository: https://github.com/mizuno-group/cell-cell-bert
Paper: https://doi.org/10.64898/2025.12.01.691726

Model Configuration

This model corresponds to the following experimental setting in the paper:

Entity Indication: [Replacement (e.g., [CELL0]) / Boundary Marking (e.g., <E0>...)]
Architecture: [Entity-aware (R-BERT style) / CLS-only]
Pre-training: [Continued Pre-training (CPT) / Base (Fine-tuning only)]

Note: Please ensure your input data preprocessing matches the Entity Indication method specified above.

How to Get Started

Preprocessing Requirement: Depending on the configuration above, you must insert specific special tokens into your input text before feeding it to the model.

For Replacement models: Replace cell names with [CELL0] and [CELL1].
For Boundary models: Wrap cell names with <E0>...</E0> and <E1>...</E1>.

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# 1. Load the model
model_name = "mizuno-group/ccbert-[INSERT-CONFIG-NAME]"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# 2. Prepare Input
# CHANGE THIS LINE based on the Entity Indication method of this model:
# text = "The [CELL0] activate [CELL1]."  # If Replacement
text = "The <E0> Macrophages </E0> activate <E1> T cells </E1>." # If Boundary Marking

# 3. Inference
inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits
    predicted_class_id = logits.argmax().item()

# 0 = No Relation, 1 = Relation Exists
print(f"Predicted Class: {predicted_class_id}")

Citation

@article{Yoshikawa2025CCBERT,
  title   = {Defining and Evaluating Cell–Cell Relation Extraction from Biomedical Literature under Realistic Annotation Constraints},
  author  = {Yoshikawa Mei and Mizuno Tadahaya and Ohto Yohei and Fujimoto Hiromi and Kusuhara Hiroyuki},
  journal = {bioRxiv},
  year    = {2025},
  doi     = {10.64898/2025.12.01.691726},
  url     = {[https://doi.org/10.64898/2025.12.01.691726](https://doi.org/10.64898/2025.12.01.691726)}
}

Downloads last month: -

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for mizuno-group/ccbert-R-pretrained

Base model

microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext

Finetuned

(118)

this model

Collection including mizuno-group/ccbert-R-pretrained

cell-cell-bert

Collection

Model for Defining and Evaluating Cell–Cell Relation Extraction from Biomedical Literature under Realistic Annotation Constraints • 10 items • Updated about 10 hours ago