3D Swin Transformer MAE for OPSCC CT Pretraining

Self-supervised masked autoencoder (MAE) using a 3D Swin Transformer backbone trained on cropped OPSCC neck CT volumes.
Includes asymmetry-aware loss weighting (airway + soft-tissue features) and overfitting monitoring via augmented-pair cosine similarity.

Model Details

  • Architecture: 3D Swin Transformer encoder + lightweight asymmetric decoder + auxiliary asymmetry prediction heads
  • Input shape: 1×60×128×128 (single-channel CT volumes, intensities normalized to [0,1])
  • Pretraining objective: Masked reconstruction (75% masking ratio) + auxiliary asymmetry regression
  • Drop path rate: linear schedule up to 0.1
  • Training: AdamW, lr=1e-4, batch size 2 (adjustable), early stopping + cosine sim monitoring

Intended Use & Limitations

Primary use: Pretraining foundation for downstream OPSCC tasks (staging, segmentation, outcome prediction)
Not intended for: Direct clinical diagnosis without fine-tuning and validation

Limitations:

  • Trained on limited cohort (TCIA-derived OPSCC cases)
  • Assumes cropped, skull-base-to-thoracic-inlet volumes
  • Asymmetry heuristics are rule-based → may miss subtle cases
  • No multi-modal / contrast-enhanced support yet

How to Use

# 1. Clone repo
git clone https://huggingface.co/jdmayfield/opscc-ct-mae-swin-pretrain
cd opscc-ct-mae-swin-pretrain

# 2. Install deps
pip install -r requirements.txt

# 3. Train (or resume from checkpoint)
python train_mae_swin3d.py \
  --data-dir /path/to/your/cropped_volumes \
  --output-dir ./checkpoints \
  --epochs 100 \
  --batch-size 2 \
  --lr 1e-4
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support