Interesting Papers - a marcelweiss Collection

Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

marcelweiss 's Collections

Interesting Papers

Interesting Papers

updated 19 days ago

These papers are interesting (to me)

Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models

Paper • 2410.02740 • Published Oct 3, 2024 • 55
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging

Paper • 2410.01215 • Published Oct 2, 2024 • 36
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Paper • 2409.17146 • Published Sep 25, 2024 • 122
EuroLLM: Multilingual Language Models for Europe

Paper • 2409.16235 • Published Sep 24, 2024 • 28
Beyond Fine-tuning: Unleashing the Potential of Continuous Pretraining for Clinical LLMs

Paper • 2409.14988 • Published Sep 23, 2024 • 24
MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines

Paper • 2409.12959 • Published Sep 19, 2024 • 38
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers

Paper • 2409.04109 • Published Sep 6, 2024 • 49
Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise

Paper • 2410.03017 • Published Oct 3, 2024 • 29
Addition is All You Need for Energy-efficient Language Models

Paper • 2410.00907 • Published Oct 1, 2024 • 152
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations

Paper • 2410.02707 • Published Oct 3, 2024 • 49
RevisEval: Improving LLM-as-a-Judge via Response-Adapted References

Paper • 2410.05193 • Published Oct 7, 2024 • 13
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models

Paper • 2411.04905 • Published Nov 7, 2024 • 127
ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning

Paper • 2411.05003 • Published Nov 7, 2024 • 72
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion

Paper • 2411.04928 • Published Nov 7, 2024 • 58
DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation

Paper • 2411.04999 • Published Nov 7, 2024 • 18
From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond

Paper • 2411.03590 • Published Nov 6, 2024 • 10
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level

Paper • 2411.03562 • Published Nov 5, 2024 • 69
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems

Paper • 2411.02959 • Published Nov 5, 2024 • 71
Zebra-Llama: A Context-Aware Large Language Model for Democratizing Rare Disease Knowledge

Paper • 2411.02657 • Published Nov 4, 2024 • 6
AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents

Paper • 2410.24024 • Published Oct 31, 2024 • 51
How Far is Video Generation from World Model: A Physical Law Perspective

Paper • 2411.02385 • Published Nov 4, 2024 • 35
Survey of Cultural Awareness in Language Models: Text and Beyond

Paper • 2411.00860 • Published Oct 30, 2024 • 25
Training-free Regional Prompting for Diffusion Transformers

Paper • 2411.02395 • Published Nov 4, 2024 • 26
DynaSaur: Large Language Agents Beyond Predefined Actions

Paper • 2411.01747 • Published Nov 4, 2024 • 37
Multi-expert Prompting Improves Reliability, Safety, and Usefulness of Large Language Models

Paper • 2411.00492 • Published Nov 1, 2024 • 6
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Paper • 2410.23218 • Published Oct 30, 2024 • 51
DELTA: Dense Efficient Long-range 3D Tracking for any video

Paper • 2410.24211 • Published Oct 31, 2024 • 9
Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning

Paper • 2410.21845 • Published Oct 29, 2024 • 16
Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Dataset

Paper • 2410.22325 • Published Oct 29, 2024 • 10
A Survey of Small Language Models

Paper • 2410.20011 • Published Oct 25, 2024 • 45
Animate-X: Universal Character Image Animation with Enhanced Motion Representation

Paper • 2410.10306 • Published Oct 14, 2024 • 57
AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant

Paper • 2410.18603 • Published Oct 24, 2024 • 33
LongReward: Improving Long-context Large Language Models with AI Feedback

Paper • 2410.21252 • Published Oct 28, 2024 • 18
Teach Multimodal LLMs to Comprehend Electrocardiographic Images

Paper • 2410.19008 • Published Oct 21, 2024 • 24
Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch

Paper • 2410.18693 • Published Oct 24, 2024 • 43
WorldSimBench: Towards Video Generation Models as World Simulators

Paper • 2410.18072 • Published Oct 23, 2024 • 20
DynamicCity: Large-Scale LiDAR Generation from Dynamic Scenes

Paper • 2410.18084 • Published Oct 23, 2024 • 14
SpectroMotion: Dynamic 3D Reconstruction of Specular Scenes

Paper • 2410.17249 • Published Oct 22, 2024 • 43
AutoTrain: No-code training for state-of-the-art models

Paper • 2410.15735 • Published Oct 21, 2024 • 60
FrugalNeRF: Fast Convergence for Few-shot Novel View Synthesis without Learned Priors

Paper • 2410.16271 • Published Oct 21, 2024 • 85
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation

Paper • 2410.13232 • Published Oct 17, 2024 • 45
MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures

Paper • 2410.13754 • Published Oct 17, 2024 • 76
MobA: A Two-Level Agent System for Efficient Mobile Task Automation

Paper • 2410.13757 • Published Oct 17, 2024 • 33
Exploring Model Kinship for Merging Large Language Models

Paper • 2410.12613 • Published Oct 16, 2024 • 21
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free

Paper • 2410.10814 • Published Oct 14, 2024 • 52
Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts

Paper • 2410.10626 • Published Oct 14, 2024 • 40
RedPajama: an Open Dataset for Training Large Language Models

Paper • 2411.12372 • Published Nov 19, 2024 • 57
Soft Robotic Dynamic In-Hand Pen Spinning

Paper • 2411.12734 • Published Nov 19, 2024 • 10
The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use

Paper • 2411.10323 • Published Nov 15, 2024 • 35
Sharingan: Extract User Action Sequence from Desktop Recordings

Paper • 2411.08768 • Published Nov 13, 2024 • 10
Hermes: A Large Language Model Framework on the Journey to Autonomous Networks

Paper • 2411.06490 • Published Nov 10, 2024 • 7
Large Language Models Can Self-Improve in Long-context Reasoning

Paper • 2411.08147 • Published Nov 12, 2024 • 67
CamemBERT 2.0: A Smarter French Language Model Aged to Perfection

Paper • 2411.08868 • Published Nov 13, 2024 • 13
GRAPE: Generalizing Robot Policy via Preference Alignment

Paper • 2411.19309 • Published Nov 28, 2024 • 48
On Domain-Specific Post-Training for Multimodal Large Language Models

Paper • 2411.19930 • Published Nov 29, 2024 • 30
Reverse Thinking Makes LLMs Stronger Reasoners

Paper • 2411.19865 • Published Nov 29, 2024 • 23
Large Language Model-Brained GUI Agents: A Survey

Paper • 2411.18279 • Published Nov 27, 2024 • 32
DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving

Paper • 2411.15139 • Published Nov 22, 2024 • 15
ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Paper • 2411.17465 • Published Nov 26, 2024 • 89
Star Attention: Efficient LLM Inference over Long Sequences

Paper • 2411.17116 • Published Nov 26, 2024 • 56
MH-MoE:Multi-Head Mixture-of-Experts

Paper • 2411.16205 • Published Nov 25, 2024 • 29
Patience Is The Key to Large Language Model Reasoning

Paper • 2411.13082 • Published Nov 20, 2024 • 7
SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints

Paper • 2412.07760 • Published Dec 10, 2024 • 56
SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training

Paper • 2412.09619 • Published Dec 12, 2024 • 28
STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution

Paper • 2501.02976 • Published Jan 6 • 56
LTX-Video: Realtime Video Latent Diffusion

Paper • 2501.00103 • Published Dec 30, 2024 • 47
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

Paper • 2501.00599 • Published Dec 31, 2024 • 48
On the Compositional Generalization of Multimodal LLMs for Medical Imaging

Paper • 2412.20070 • Published Dec 28, 2024 • 47
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs

Paper • 2412.21187 • Published Dec 30, 2024 • 42
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Paper • 2412.18925 • Published Dec 25, 2024 • 105
VideoRAG: Retrieval-Augmented Generation over Video Corpus

Paper • 2501.05874 • Published Jan 10 • 73
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs

Paper • 2501.06186 • Published Jan 10 • 66
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though

Paper • 2501.04682 • Published Jan 8 • 99
Agent Laboratory: Using LLM Agents as Research Assistants

Paper • 2501.04227 • Published Jan 8 • 93
Multimodal LLMs Can Reason about Aesthetics in Zero-Shot

Paper • 2501.09012 • Published Jan 15 • 10
VideoAuteur: Towards Long Narrative Video Generation

Paper • 2501.06173 • Published Jan 10 • 34
O1 Replication Journey -- Part 3: Inference-time Scaling for Medical Reasoning

Paper • 2501.06458 • Published Jan 11 • 32
Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents

Paper • 2504.00906 • Published Apr 1 • 25
Distilling LLM Agent into Small Models with Retrieval and Code Tools

Paper • 2505.17612 • Published May 23 • 81
VeriThinker: Learning to Verify Makes Reasoning Model Efficient

Paper • 2505.17941 • Published May 23 • 25
Reasoning Model is Stubborn: Diagnosing Instruction Overriding in Reasoning Models

Paper • 2505.17225 • Published May 22 • 65
NovelSeek: When Agent Becomes the Scientist -- Building Closed-Loop System from Hypothesis to Verification

Paper • 2505.16938 • Published May 22 • 121
LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning

Paper • 2505.16933 • Published May 22 • 34
Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models

Paper • 2505.14810 • Published May 20 • 63
Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding

Paper • 2505.16990 • Published May 22 • 21
Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective

Paper • 2505.15045 • Published May 21 • 55
MMaDA: Multimodal Large Diffusion Language Models

Paper • 2505.15809 • Published May 21 • 96
Emerging Properties in Unified Multimodal Pretraining

Paper • 2505.14683 • Published May 20 • 134
Neurosymbolic Diffusion Models

Paper • 2505.13138 • Published May 19 • 34
The Aloe Family Recipe for Open and Specialized Healthcare LLMs

Paper • 2505.04388 • Published May 7 • 27
Latent Flow Transformer

Paper • 2505.14513 • Published May 20 • 29
Chain-of-Model Learning for Language Model

Paper • 2505.11820 • Published May 17 • 122
AdaptThink: Reasoning Models Can Learn When to Think

Paper • 2505.13417 • Published May 19 • 82
Model Merging in Pre-training of Large Language Models

Paper • 2505.12082 • Published May 17 • 39
Visual Planning: Let's Think Only with Images

Paper • 2505.11409 • Published May 16 • 57
Group Think: Multiple Concurrent Reasoning Agents Collaborating at Token Level Granularity

Paper • 2505.11107 • Published May 16 • 29
Parallel Scaling Law for Language Models

Paper • 2505.10475 • Published May 15 • 83
The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think

Paper • 2505.10185 • Published May 15 • 26
Real2Render2Real: Scaling Robot Data Without Dynamics Simulation or Robot Hardware

Paper • 2505.09601 • Published May 14 • 5
Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging

Paper • 2505.05464 • Published May 8 • 11
Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models

Paper • 2505.04921 • Published May 8 • 186
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning

Paper • 2507.16784 • Published Jul 22 • 119
ReasonRank: Empowering Passage Ranking with Strong Reasoning Ability

Paper • 2508.07050 • Published 27 days ago • 114
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent

Paper • 2508.05748 • Published 29 days ago • 122
A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems

Paper • 2508.07407 • Published 26 days ago • 89
Voost: A Unified and Scalable Diffusion Transformer for Bidirectional Virtual Try-On and Try-Off

Paper • 2508.04825 • Published 30 days ago • 58
MolmoAct: Action Reasoning Models that can Reason in Space

Paper • 2508.07917 • Published 25 days ago • 41
Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation

Paper • 2508.05635 • Published 29 days ago • 72
A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence

Paper • 2507.21046 • Published Jul 28 • 81
villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models

Paper • 2507.23682 • Published Jul 31 • 23
GUI-G^2: Gaussian Reward Modeling for GUI Grounding

Paper • 2507.15846 • Published Jul 21 • 131
A Survey of Context Engineering for Large Language Models

Paper • 2507.13334 • Published Jul 17 • 249
MemOS: A Memory OS for AI System

Paper • 2507.03724 • Published Jul 4 • 153

Collection guide
Browse collections

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs