5 540 1

Literate Goggles

literate-goggles

AI & ML interests

None yet

Recent Activity

upvoted a paper about 21 hours ago

Drax: Speech Recognition with Discrete Flow Matching

upvoted a paper 1 day ago

Attention Sinks in Diffusion Language Models

upvoted a paper 1 day ago

Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference

View all activity

Organizations

None yet

upvoted a paper about 21 hours ago

Drax: Speech Recognition with Discrete Flow Matching

Paper • 2510.04162 • Published Oct 5, 2025 • 28

upvoted 2 papers 1 day ago

Attention Sinks in Diffusion Language Models

Paper • 2510.15731 • Published Oct 17, 2025 • 50

Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference

Paper • 2508.02193 • Published Aug 4, 2025 • 137

upvoted a paper 2 days ago

Omni-Embed-Nemotron: A Unified Multimodal Retrieval Model for Text, Image, Audio, and Video

Paper • 2510.03458 • Published Oct 3, 2025 • 3

upvoted 2 papers 8 days ago

LongCat-Audio-Codec: An Audio Tokenizer and Detokenizer Solution Designed for Speech Large Language Models

Paper • 2510.15227 • Published Oct 17, 2025 • 2

DreamID-Omni: Unified Framework for Controllable Human-Centric Audio-Video Generation

Paper • 2602.12160 • Published 28 days ago • 38

upvoted a paper 15 days ago

Aletheia tackles FirstProof autonomously

Paper • 2602.21201 • Published 16 days ago • 6

upvoted a paper 16 days ago

RePo: Language Models with Context Re-Positioning

Paper • 2512.14391 • Published Dec 16, 2025 • 12

upvoted a paper 17 days ago

Adam Improves Muon: Adaptive Moment Estimation with Orthogonalized Momentum

Paper • 2602.17080 • Published 21 days ago • 3

upvoted a paper 20 days ago

Can Training Dynamics of Scale-Invariant Neural Networks Be Explained by the Thermodynamics of an Ideal Gas?

Paper • 2511.07308 • Published Nov 10, 2025 • 1

upvoted a paper 24 days ago

MOSS-Audio-Tokenizer: Scaling Audio Tokenizers for Future Audio Foundation Models

Paper • 2602.10934 • Published 29 days ago • 49

upvoted 2 papers about 1 month ago

InstructTTSEval: Benchmarking Complex Natural-Language Instruction Following in Text-to-Speech Systems

Paper • 2506.16381 • Published Jun 19, 2025 • 4

Qwen3-TTS Technical Report

Paper • 2601.15621 • Published Jan 22 • 70

upvoted 4 papers about 2 months ago

TiDAR: Think in Diffusion, Talk in Autoregression

Paper • 2511.08923 • Published Nov 12, 2025 • 128

Music Flamingo: Scaling Music Understanding in Audio Language Models

Paper • 2511.10289 • Published Nov 13, 2025 • 18

Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting

Paper • 2601.02151 • Published Jan 5 • 112

CosyEdit: Unlocking End-to-End Speech Editing Capability from Zero-Shot Text-to-Speech Models

Paper • 2601.05329 • Published Jan 8 • 1

upvoted 3 papers 2 months ago

Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation

Paper • 2601.00664 • Published Jan 2 • 57

Knot Forcing: Taming Autoregressive Video Diffusion Models for Real-time Infinite Interactive Portrait Animation

Paper • 2512.21734 • Published Dec 25, 2025 • 5

LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation

Paper • 2512.23576 • Published Dec 29, 2025 • 65

Literate Goggles

AI & ML interests

Recent Activity

Organizations

literate-goggles's activity