Collections
Discover the best community collections!
Collections including paper arxiv:2407.14358
-
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Paper • 2311.07919 • Published • 9 -
mozilla-foundation/common_voice_17_0
Viewer • Updated • 13M • 27k • 172 -
Stable Audio Open
Paper • 2407.14358 • Published • 23 -
fnlp/AnyGPT-chat
Text Generation • Updated • 1.17k • 15
-
GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
Paper • 2406.11768 • Published • 20 -
Investigating Decoder-only Large Language Models for Speech-to-text Translation
Paper • 2407.03169 • Published • 9 -
PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation
Paper • 2407.02869 • Published • 18 -
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs
Paper • 2407.04051 • Published • 35
-
SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation
Paper • 2405.18503 • Published • 9 -
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation
Paper • 2405.20289 • Published • 10 -
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes
Paper • 2406.02897 • Published • 13 -
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning
Paper • 2406.03344 • Published • 18
-
A Novel 1D State Space for Efficient Music Rhythmic Analysis
Paper • 2111.00704 • Published -
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Paper • 2312.09911 • Published • 53 -
Music Style Transfer with Time-Varying Inversion of Diffusion Models
Paper • 2402.13763 • Published • 9 -
ChatMusician: Understanding and Generating Music Intrinsically with LLM
Paper • 2402.16153 • Published • 56
-
Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like
Paper • 2402.07383 • Published • 13 -
Matcha-TTS: A fast TTS architecture with conditional flow matching
Paper • 2309.03199 • Published • 11 -
Natural language guidance of high-fidelity text-to-speech with synthetic annotations
Paper • 2402.01912 • Published • 11 -
Fast Timing-Conditioned Latent Audio Diffusion
Paper • 2402.04825 • Published • 7
-
NExT-GPT: Any-to-Any Multimodal LLM
Paper • 2309.05519 • Published • 78 -
Large Language Model for Science: A Study on P vs. NP
Paper • 2309.05689 • Published • 20 -
AstroLLaMA: Towards Specialized Foundation Models in Astronomy
Paper • 2309.06126 • Published • 16 -
Large Language Models for Compiler Optimization
Paper • 2309.07062 • Published • 23