microsoft/Phi-4-multimodal-instruct Automatic Speech Recognition β’ 6B β’ Updated Dec 10, 2025 β’ 214k β’ 1.56k
HuggingFaceTB/SmolVLM2-500M-Video-Instruct Image-Text-to-Text β’ 0.5B β’ Updated Apr 8, 2025 β’ 93.3k β’ 117
openai/clip-vit-large-patch14 Zero-Shot Image Classification β’ 0.4B β’ Updated Sep 15, 2023 β’ 8.05M β’ 1.95k
Runtime error Featured 272 Edit Video By Editing Text β 272 Audio-based video editing using AI-generated transcription