cell-cell-BERT
Configuration: R-pretrained
This model includes learned embeddings for special tokens (e.g., [CELL0], [CELL1]), acquired through continued pre-training on biomedical text.
Model Description
This is a specific configuration of the cell-cell-BERT model for extracting cell-cell interactions from biomedical text. It determines whether a sentence describes a direct biological relationship between two target cell types.
For full details, see our paper: "Defining and Evaluating Cell–Cell Relation Extraction from Biomedical Literature under Realistic Annotation Constraints" (bioRxiv, 2025).
- Repository: https://github.com/mizuno-group/cell-cell-bert
- Paper: https://doi.org/10.64898/2025.12.01.691726
Model Configuration
This model corresponds to the following experimental setting in the paper:
- Entity Indication: [Replacement (e.g.,
[CELL0]) / Boundary Marking (e.g.,<E0>...)] - Architecture: [Entity-aware (R-BERT style) / CLS-only]
- Pre-training: [Continued Pre-training (CPT) / Base (Fine-tuning only)]
Note: Please ensure your input data preprocessing matches the Entity Indication method specified above.
How to Get Started
Preprocessing Requirement: Depending on the configuration above, you must insert specific special tokens into your input text before feeding it to the model.
- For Replacement models: Replace cell names with
[CELL0]and[CELL1]. - For Boundary models: Wrap cell names with
<E0>...</E0>and<E1>...</E1>.
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# 1. Load the model
model_name = "mizuno-group/ccbert-[INSERT-CONFIG-NAME]"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# 2. Prepare Input
# CHANGE THIS LINE based on the Entity Indication method of this model:
# text = "The [CELL0] activate [CELL1]." # If Replacement
text = "The <E0> Macrophages </E0> activate <E1> T cells </E1>." # If Boundary Marking
# 3. Inference
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
predicted_class_id = logits.argmax().item()
# 0 = No Relation, 1 = Relation Exists
print(f"Predicted Class: {predicted_class_id}")
Citation
@article{Yoshikawa2025CCBERT,
title = {Defining and Evaluating Cell–Cell Relation Extraction from Biomedical Literature under Realistic Annotation Constraints},
author = {Yoshikawa Mei and Mizuno Tadahaya and Ohto Yohei and Fujimoto Hiromi and Kusuhara Hiroyuki},
journal = {bioRxiv},
year = {2025},
doi = {10.64898/2025.12.01.691726},
url = {[https://doi.org/10.64898/2025.12.01.691726](https://doi.org/10.64898/2025.12.01.691726)}
}
- Downloads last month
- -