Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models: https://arxiv.org/pdf/2406.14852
Jiayu (Mila) Wang PRO
MilaWang
AI & ML interests
Large Language Model, Multimodal Large Language Model, Reasoning, Efficient Machine Learning System
Recent Activity
updated
a collection
19 days ago
SpatialEval
updated
a dataset
about 1 month ago
MilaWang/SpatialEval
authored
a paper
about 2 months ago
Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for
Vision Language Models
Organizations
None yet
Collections
2
Model checkpoints and datasets used in the paper Grammar-Aligned Decoding: https://arxiv.org/abs/2405.21047
-
MilaWang/Mistral-7B-Instruct-v0.2-gad-cp-merged
Text Generation • Updated • 6 -
MilaWang/Mistral-7B-Instruct-v0.2-gad-bv4nogram0-merged
Text Generation • Updated • 9 -
MilaWang/Mistral-7B-Instruct-v0.2-gad-slianogram3-merged
Text Generation • Updated • 6 -
MilaWang/Mistral-7B-Instruct-v0.2-gad-cp8-merged
Text Generation • Updated • 7
Papers
2
models
6
MilaWang/Mistral-7B-Instruct-v0.2-gad-slianogram0-merged
Text Generation
•
Updated
•
7
MilaWang/Mistral-7B-Instruct-v0.2-gad-bv4nogram0-merged
Text Generation
•
Updated
•
9
MilaWang/Mistral-7B-Instruct-v0.2-gad-slianogram3-merged
Text Generation
•
Updated
•
6
MilaWang/Mistral-7B-Instruct-v0.2-gad-bv4nogram3-merged
Text Generation
•
Updated
•
6
MilaWang/Mistral-7B-Instruct-v0.2-gad-cp-merged
Text Generation
•
Updated
•
6
MilaWang/Mistral-7B-Instruct-v0.2-gad-cp8-merged
Text Generation
•
Updated
•
7
datasets
7
MilaWang/SpatialEval
Viewer
•
Updated
•
13.9k
•
86
•
2
MilaWang/gad-slia-no-grammar-0shots
Viewer
•
Updated
•
81
•
30
MilaWang/gad-bv4-no-grammar-0shots
Viewer
•
Updated
•
139
•
30
MilaWang/gad-cp
Viewer
•
Updated
•
2.42k
•
31
MilaWang/gad-cp-8shots
Viewer
•
Updated
•
2.42k
•
28
MilaWang/gad-slia-no-grammar-3shots
Viewer
•
Updated
•
81
•
32
MilaWang/gad-bv4-no-grammar-3shots
Viewer
•
Updated
•
139
•
32