Testing a QLoRA adaptor for allenai/Molmo-7B-D-0924,
Targets attention layer of Transformer backbone and image pooling and projection layers of Vision backbone
Trained on 47 screenshots of a low-poly video game with ragdoll casualties
Evaluated on 44 screenshots of aforementioned video game
Molmo has an edge case where it declares there are no humans in an image:

This custom QLoRA successfully reduces the occurance of these cases

However, pointing to non-human objects is observed to increase.
Comparison of Model performance with and without QLora on Eval dataset
| Model | Molmo-7B-D | Molmo-7B-D w/ QLora |
|---|---|---|
| Precision | 92.1 | 80.5 |
| Recall | 70.4 | 88.5 |
Dataset: reubk/RavenfieldDataset
- Downloads last month
- 5