MoonDream2 Fine-Tuning on Med VQA RAD Dataset

Description

This project fine-tunes the MoonDream2 model on the Med VQA RAD dataset to improve medical visual question answering (VQA) capabilities. The fine-tuning process optimizes performance by adjusting hyperparameters using Optuna and tracks training progress with Weights & Biases (W&B).

Training Environment

  • Hardware: NVIDIA GPU (CUDA enabled)
  • Frameworks: PyTorch, Hugging Face Transformers
  • Optimizer: Adam8bit (from bitsandbytes)
  • Batch Processing: DataLoader (Torch)
  • Hyperparameter Tuning: Optuna
  • Logging: Weights & Biases (W&B)
  • Device: CUDA-enabled GPU

Dataset

  • Name: Med VQA RAD
  • Content: Medical visual question-answering dataset with radiology images and associated Q&A pairs.
  • Preprocessing: Images are processed through MoonDream2's vision encoder.
  • Tokenization: Text is tokenized with Hugging Face's tokenizer.

Training Parameters

  • Model: vikhyatk/MoonDream2
  • Number of Image Tokens: 729
  • Learning Rate (LR): Tuned via Optuna (log-uniform search between 1e-6 and 1e-4)
  • Batch Size: 3
  • Gradient Accumulation Steps: 8 / Batch Size
  • Optimizer: Adam8bit (betas=(0.9, 0.95), eps=1e-6)
  • Loss Function: Cross-entropy loss computed on token-level outputs
  • Scheduler: Cosine Annealing with warm-up (10% of total steps)
  • Epochs: Tuned via Optuna (default: 1-2 epochs)
  • Validation Strategy: Loss-based evaluation on validation set

Training Process

  1. Collate Function:
    • Prepares image embeddings using MoonDream2’s vision encoder.
    • Converts question-answer pairs into tokenized sequences.
    • Pads sequences to ensure uniform input length.
  2. Loss Computation:
    • Generates text embeddings.
    • Concatenates image and text embeddings.
    • Computes loss using MoonDream2’s causal language model.
  3. Learning Rate Scheduling:
    • Starts at 0.1 × LR and gradually increases.
    • Uses cosine decay after warm-up.
  4. Hyperparameter Optimization:
    • Optuna optimizes learning rate and epoch count.
    • Trials are pruned if performance is suboptimal.
  5. Logging & Monitoring:
    • W&B logs loss, learning rate, and training progress.

Results

  • Best Hyperparameters: Selected via Optuna trials.
  • Final Validation Loss: Computed and logged.
  • Model Performance: Evaluated using token-wise accuracy and qualitative assessment.

References

Downloads last month
33
Safetensors
Model size
1.87B params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for titanhacker/moondream-2b-Med-Vqa-Finetuned

Finetuned
(3)
this model

Dataset used to train titanhacker/moondream-2b-Med-Vqa-Finetuned