Rocl Jamez's picture

55

Rocl Jamez

James62

·

AI & ML interests

None yet

Organizations

None yet

James62's activity

upvoted a paper 10 months ago

Masked Audio Generation using a Single Non-Autoregressive Transformer

Paper • 2401.04577 • Published Jan 9 • 41

upvoted 2 papers 11 months ago

SparQ Attention: Bandwidth-Efficient LLM Inference

Paper • 2312.04985 • Published Dec 8, 2023 • 38

Merlin:Empowering Multimodal LLMs with Foresight Minds

Paper • 2312.00589 • Published Nov 30, 2023 • 24

upvoted 17 papers 12 months ago

EDMSound: Spectrogram Based Diffusion Models for Efficient and High-Quality Audio Synthesis

Paper • 2311.08667 • Published Nov 15, 2023 • 18

Single-Image 3D Human Digitization with Shape-Guided Diffusion

Paper • 2311.09221 • Published Nov 15, 2023 • 20

DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model

Paper • 2311.09217 • Published Nov 15, 2023 • 21

Drivable 3D Gaussian Avatars

Paper • 2311.08581 • Published Nov 14, 2023 • 46

One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion

Paper • 2311.07885 • Published Nov 14, 2023 • 39

Story-to-Motion: Synthesizing Infinite and Controllable Character Animation from Long Text

Paper • 2311.07446 • Published Nov 13, 2023 • 28

GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation

Paper • 2311.07562 • Published Nov 13, 2023 • 12

ChatAnything: Facetime Chat with LLM-Enhanced Personas

Paper • 2311.06772 • Published Nov 12, 2023 • 34

Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models

Paper • 2311.06783 • Published Nov 12, 2023 • 26

Music ControlNet: Multiple Time-varying Controls for Music Generation

Paper • 2311.07069 • Published Nov 13, 2023 • 43

JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models

Paper • 2311.05997 • Published Nov 10, 2023 • 36

Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model

Paper • 2311.06214 • Published Nov 10, 2023 • 29

GPT4All: An Ecosystem of Open Source Compressed Language Models

Paper • 2311.04931 • Published Nov 6, 2023 • 20

u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model

Paper • 2311.05348 • Published Nov 9, 2023 • 11

On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving

Paper • 2311.05332 • Published Nov 9, 2023 • 9

Prompt Cache: Modular Attention Reuse for Low-Latency Inference

Paper • 2311.04934 • Published Nov 7, 2023 • 28

LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 45