Upcoming behavioral assessments include a large array of QWEN VLM models I will publish benchmarks for.
These will be aligned to generic use-case, meaning as many tasks as possible that do not require finetuning.
- Which produces valid json schema?
- image classification
- bounding box location
- image text identification and accuracy checking
- structural and spatial awareness
- 3d geometric object identification and awareness
- camera rotational offset
- subject fixation and awareness
- semantic association
- depth analysis
- segmentation potential
- vit accuracy to image prompting
- outline and association testing
- style identification and structural awareness
- type differentiation with data types; json, yaml, MD, and a multitude of other potentials.
- utilization and response to those types and the expected prompts
