Image to Compositional 3D Scene Generation
Evaluate and generate text based on images and videos
A demo of HVI-CIDNet
Detect and segment objects in images using text or visual prompts
A text-to-speech model powered by SparkAudio and Mobvoi.
3D-aware Video Diffusion for Video Generation Control
Tuning-free subject-driven generation
Audio to Talking Face
Generate customized images with concept highlights
Generate text responses to user prompts
Demo for audiobox-aesthetics
Interact with AI using text, images, or audio
Blazingly Fast and Embarrassingly Simple Song Generation
Segment anime images to isolate elements
Generate and refine 3D models from text prompts
Gradio demo of CogView4-6B
Explore and discover all leaderboards from the HF community
Demo for Multimodal-SAE
Generate depth maps from your images
Space demoing Phi4 MultiModal