Model Details
Generative AI has many practical uses. One important use is synthetic data generation. And why is synthetic data useful? Well, without it any model trained on medical image data probably has to use real images. And this has some implications when it comes to patient privacy. So to help mitigate this risk, synthetic data generators are a great tool.
They are initially trained on real data, but the data they produce can be used to train other models, eliminating the need to use the real patient data.
I decided to use a GAN (Generative Adverserial Network) model for this task. It consists of a generator to create images and a discriminator to determine whether the images are fake or not. And as the generator gets better and better at creating new images, the discriminator has more difficulty distinguishing them.
The images generated are CT scans of the lungs, either cancerous or non-cancerous.
I used this for reference, and adapted it to my own use case. https://colab.research.google.com/github/pytorch/tutorials/blob/gh-pages/_downloads/dcgan_faces_tutorial.ipynb
Uses
This model is a proof of concept, that shows it is possible to create good synthetic image data. Which in turn can be used to train other models, eventually reducing the need for using actual patient data.
Out-of-Scope Use
This model has not ben evaluated for actual medical use or diagnosis of any kind.
Bias, Risks, and Limitations
Not trained on data that has been evaluated for balance. It was simply a proof of concept. The data originally comes from the Imaging Data commons of the NIH, https://portal.imaging.datacommons.cancer.gov/explore/filters/?collection_id=nlst
Citation:
Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S., Aerts, H. J. W. L., Homeyer, A., Lewis, R., Akbarzadeh, A., Bontempi, D., Clifford, W., Herrmann, M. D., Höfener, H., Octaviano, I., Osborne, C., Paquette, S., Petts, J., Punzo, D., Reyes, M., Schacherer, D. P., Tian, M., White, G., Ziegler, E., Shmulevich, I., Pihl, T., Wagner, U., Farahani, K. & Kikinis, R. NCI Imaging Data Commons. Cancer Res. 81, 4188–4193 (2021). http://dx.doi.org/10.1158/0008-5472.CAN-21-0950 https://portal.imaging.datacommons.cancer.gov/explore/filters/?collection_id=nlst
Recommendations
This is a proof of concept only
How to Get Started with the Model
Use the code below to get started with the model.
[More Information Needed]
import torch
from huggingface_hub import hf_hub_download
from torch import nn
# Generator Code
from huggingface_hub import hf_hub_download
from torch import nn
from torchvision.utils import save_image
class Generator(nn.Module):
def __init__(self, ngpu=1, ngf=128, nz=128, nc =1):
super(Generator, self).__init__()
self.ngpu = ngpu
self.main = nn.Sequential(
nn.ConvTranspose2d(nz, ngf * 16, 4, 1, 0, bias=False),
nn.BatchNorm2d(ngf*16),
nn.LeakyReLU(0.2, inplace=True),
nn.ConvTranspose2d(ngf*16, ngf*8, 4, 2, 1, bias=False),
nn.BatchNorm2d(ngf * 8),
nn.LeakyReLU(0.2, inplace=True),
nn.ConvTranspose2d(ngf*8, ngf*4, 4, 2, 1, bias=False),
nn.BatchNorm2d(ngf*4),
nn.ReLU(True),
nn.ConvTranspose2d(ngf*4, ngf*2, 4, 2, 1, bias=False),
nn.BatchNorm2d(ngf * 2),
nn.ReLU(True),
nn.ConvTranspose2d(ngf*2,ngf, 4, 2, 1, bias=False),
nn.BatchNorm2d(ngf),
nn.ReLU(True),
nn.ConvTranspose2d(ngf,nc, 4, 2, 1, bias=False),
nn.Tanh()
)
def forward(self, input):
return self.main(input)
model = Generator()
weights_path = hf_hub_download('oohtmeel/ct-gan-gen', 'lung_ct_generator_gan.pth')
model.load_state_dict(torch.load(weights_path, map_location=torch.device('cpu')))
out = model(torch.randn(128, 128, 1, 1))
save_image(out, "ct_scans.png", normalize=True)