3D Swin Transformer MAE for OPSCC CT Pretraining
Self-supervised masked autoencoder (MAE) using a 3D Swin Transformer backbone trained on cropped OPSCC neck CT volumes.
Includes asymmetry-aware loss weighting (airway + soft-tissue features) and overfitting monitoring via augmented-pair cosine similarity.
Model Details
- Architecture: 3D Swin Transformer encoder + lightweight asymmetric decoder + auxiliary asymmetry prediction heads
- Input shape: 1×60×128×128 (single-channel CT volumes, intensities normalized to [0,1])
- Pretraining objective: Masked reconstruction (75% masking ratio) + auxiliary asymmetry regression
- Drop path rate: linear schedule up to 0.1
- Training: AdamW, lr=1e-4, batch size 2 (adjustable), early stopping + cosine sim monitoring
Intended Use & Limitations
Primary use: Pretraining foundation for downstream OPSCC tasks (staging, segmentation, outcome prediction)
Not intended for: Direct clinical diagnosis without fine-tuning and validation
Limitations:
- Trained on limited cohort (TCIA-derived OPSCC cases)
- Assumes cropped, skull-base-to-thoracic-inlet volumes
- Asymmetry heuristics are rule-based → may miss subtle cases
- No multi-modal / contrast-enhanced support yet
How to Use
# 1. Clone repo
git clone https://huggingface.co/jdmayfield/opscc-ct-mae-swin-pretrain
cd opscc-ct-mae-swin-pretrain
# 2. Install deps
pip install -r requirements.txt
# 3. Train (or resume from checkpoint)
python train_mae_swin3d.py \
--data-dir /path/to/your/cropped_volumes \
--output-dir ./checkpoints \
--epochs 100 \
--batch-size 2 \
--lr 1e-4
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support