Generate speech from text with reference audio
Analyze images to identify and tag content
Generate text based on input