Frank's picture

Frank

frank0125

AI & ML interests

Speech Modeling

Recent Activity

upvoted a paper 6 days ago

Accent Vector: Controllable Accent Manipulation for Multilingual TTS Without Accented Data

upvoted a paper 6 days ago

DreamVideo-Omni: Omni-Motion Controlled Multi-Subject Video Customization with Latent Identity Reinforcement Learning

upvoted a paper 6 days ago

ShotVerse: Advancing Cinematic Camera Control for Text-Driven Multi-Shot Video Creation

View all activity

Organizations

None yet

upvoted 3 papers 6 days ago

Accent Vector: Controllable Accent Manipulation for Multilingual TTS Without Accented Data

Paper • 2603.07534 • Published 12 days ago • 5

DreamVideo-Omni: Omni-Motion Controlled Multi-Subject Video Customization with Latent Identity Reinforcement Learning

Paper • 2603.12257 • Published 8 days ago • 31

ShotVerse: Advancing Cinematic Camera Control for Text-Driven Multi-Shot Video Creation

Paper • 2603.11421 • Published 9 days ago • 34

upvoted 3 papers about 2 months ago

End-to-End Joint ASR and Speaker Role Diarization with Child-Adult Interactions

Paper • 2601.17640 • Published Jan 25 • 5

daVinci-Dev: Agent-native Mid-training for Software Engineering

Paper • 2601.18418 • Published Jan 26 • 126

Quantifying Speaker Embedding Phonological Rule Interactions in Accented Speech Synthesis

Paper • 2601.14417 • Published Jan 20 • 5

upvoted 3 papers 8 months ago

Llama-3.1-FoundationAI-SecurityLLM-8B-Instruct Technical Report

Paper • 2508.01059 • Published Aug 1, 2025 • 34

Qwen-Image Technical Report

Paper • 2508.02324 • Published Aug 4, 2025 • 273

Voxlect: A Speech Foundation Model Benchmark for Modeling Dialects and Regional Languages Around the Globe

Paper • 2508.01691 • Published Aug 3, 2025 • 10

upvoted 3 papers 10 months ago

Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning

Paper • 2505.16410 • Published May 22, 2025 • 58

NovelSeek: When Agent Becomes the Scientist -- Building Closed-Loop System from Hypothesis to Verification

Paper • 2505.16938 • Published May 22, 2025 • 121

Vox-Profile: A Speech Foundation Model Benchmark for Characterizing Diverse Speaker and Speech Traits

Paper • 2505.14648 • Published May 20, 2025 • 9