26 13 24

Jaemin Cho

j-min

https://j-min.io

AI & ML interests

None yet

Recent Activity

upvoted a collection 28 days ago

MolmoAct2 Models

upvoted a paper about 2 months ago

MolmoWeb: Open Visual Web Agent and Open Data for the Open Web

upvoted a paper 2 months ago

VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models

View all activity

Organizations

upvoted a collection 28 days ago

MolmoAct2 Models

Collection

Collection of the base models for MolmoAct2 • 6 items • Updated 28 days ago • 20

upvoted a paper about 2 months ago

MolmoWeb: Open Visual Web Agent and Open Data for the Open Web

Paper • 2604.08516 • Published Apr 9 • 44

upvoted a paper 2 months ago

VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models

Paper • 2603.24575 • Published Mar 25 • 18

liked a Space 2 months ago

VFig Image2SVG Demo

🎨

VFig converts any diagram image into editable SVG code.

liked a dataset 3 months ago

Malak-Mansour/RH20T

Updated May 23, 2025 • 74 • 2

New activity in j-min/IterInpaint-CLEVR 5 months ago

Adding `safetensors` variant of this model

#2 opened 9 months ago by

SFconvertbot

New activity in j-min/reco_sd14_coco 5 months ago

Adding `safetensors` variant of this model

#1 opened 9 months ago by

SFconvertbot

commented a paper 10 months ago

RotBench: Evaluating Multimodal Large Language Models on Identifying Image Rotation

Paper • 2508.13968 • Published Aug 19, 2025 • 1 •

authored a paper 10 months ago

Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents

Paper • 2508.05954 • Published Aug 8, 2025 • 6

commented a paper 10 months ago

Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents

Paper • 2508.05954 • Published Aug 8, 2025 • 6 •

upvoted a paper 10 months ago

Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents

Paper • 2508.05954 • Published Aug 8, 2025 • 6

New activity in j-min/PaintSkills 11 months ago

Upload train images (in zip files)

#8 opened 11 months ago by

j-min

Upload count/train_images

#7 opened 12 months ago by

j-min

commented a paper 11 months ago

Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models

Paper • 2507.13344 • Published Jul 17, 2025 • 59 •

authored 5 papers 11 months ago

Hierarchical Video-Moment Retrieval and Step-Captioning

Paper • 2303.16406 • Published Mar 29, 2023

EPiC: Efficient Video Camera Control Learning with Precise Anchor-Video Guidance

Paper • 2505.21876 • Published May 28, 2025 • 9

Video-Skill-CoT: Skill-based Chain-of-Thoughts for Domain-Adaptive Video Reasoning

Paper • 2506.03525 • Published Jun 4, 2025 • 6

CLaMR: Contextualized Late-Interaction for Multimodal Content Retrieval

Paper • 2506.06144 • Published Jun 6, 2025

A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality

Paper • 2507.07202 • Published Jul 9, 2025 • 25

upvoted a paper 11 months ago

A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality

Paper • 2507.07202 • Published Jul 9, 2025 • 25

Jaemin Cho

AI & ML interests

Recent Activity

Organizations

j-min's activity

VFig Image2SVG Demo

Adding `safetensors` variant of this model

Adding `safetensors` variant of this model

Upload train images (in zip files)

Upload count/train_images