MoonDream2 Fine-Tuning on Med VQA RAD Dataset
Description
This project fine-tunes the MoonDream2 model on the Med VQA RAD dataset to improve medical visual question answering (VQA) capabilities. The fine-tuning process optimizes performance by adjusting hyperparameters using Optuna and tracks training progress with Weights & Biases (W&B).
Training Environment
- Hardware: NVIDIA GPU (CUDA enabled)
- Frameworks: PyTorch, Hugging Face Transformers
- Optimizer: Adam8bit (from bitsandbytes)
- Batch Processing: DataLoader (Torch)
- Hyperparameter Tuning: Optuna
- Logging: Weights & Biases (W&B)
- Device: CUDA-enabled GPU
Dataset
- Name: Med VQA RAD
- Content: Medical visual question-answering dataset with radiology images and associated Q&A pairs.
- Preprocessing: Images are processed through MoonDream2's vision encoder.
- Tokenization: Text is tokenized with Hugging Face's tokenizer.
Training Parameters
- Model: vikhyatk/MoonDream2
- Number of Image Tokens: 729
- Learning Rate (LR): Tuned via Optuna (log-uniform search between 1e-6 and 1e-4)
- Batch Size: 3
- Gradient Accumulation Steps: 8 / Batch Size
- Optimizer: Adam8bit (betas=(0.9, 0.95), eps=1e-6)
- Loss Function: Cross-entropy loss computed on token-level outputs
- Scheduler: Cosine Annealing with warm-up (10% of total steps)
- Epochs: Tuned via Optuna (default: 1-2 epochs)
- Validation Strategy: Loss-based evaluation on validation set
Training Process
- Collate Function:
- Prepares image embeddings using MoonDream2’s vision encoder.
- Converts question-answer pairs into tokenized sequences.
- Pads sequences to ensure uniform input length.
- Loss Computation:
- Generates text embeddings.
- Concatenates image and text embeddings.
- Computes loss using MoonDream2’s causal language model.
- Learning Rate Scheduling:
- Starts at 0.1 × LR and gradually increases.
- Uses cosine decay after warm-up.
- Hyperparameter Optimization:
- Optuna optimizes learning rate and epoch count.
- Trials are pruned if performance is suboptimal.
- Logging & Monitoring:
- W&B logs loss, learning rate, and training progress.
Results
- Best Hyperparameters: Selected via Optuna trials.
- Final Validation Loss: Computed and logged.
- Model Performance: Evaluated using token-wise accuracy and qualitative assessment.
References
- Downloads last month
- 33
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.
Model tree for titanhacker/moondream-2b-Med-Vqa-Finetuned
Base model
vikhyatk/moondream2