yuchenlin (Bill Yuchen Lin)

upvoted a paper 9 months ago

TinyV: Reducing False Negatives in Verification Improves RL for LLM Reasoning

Paper • 2505.14625 • Published May 20, 2025 • 13

upvoted a paper about 1 year ago

ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning

Paper • 2502.01100 • Published Feb 3, 2025 • 21

upvoted a collection about 1 year ago

Magpie Reasoning Datasets

Collection

Reasoning datasets built by Magpie and its friends! • 8 items • Updated Jan 27, 2025 • 11

upvoted a paper over 1 year ago

On Memorization of Large Language Models in Logical Reasoning

Paper • 2410.23123 • Published Oct 30, 2024 • 18

upvoted a collection over 1 year ago

MagpieLM

Collection

Aligning LMs with Fully Open Recipe + Synthetic Data Generated from Open-Source LMs. • 9 items • Updated Jan 13, 2025 • 17

upvoted an article over 1 year ago

Article

ZebraLogic: Benchmarking the Logical Reasoning Ability of Language Models

Jul 27, 2024

•

34

upvoted 2 collections over 1 year ago

Magpie-Qwen2 Datasets

Collection

Dataset built with Qwen2 72B and Qwen2 7B. • 6 items • Updated Jan 13, 2025 • 10

Zebra Logic Bench

Collection

ZebraLogic Bench: Testing the Limits of LLMs in Logical Reasoning • 4 items • Updated Dec 23, 2025 • 5

upvoted 4 papers over 1 year ago

The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism

Paper • 2407.10457 • Published Jul 15, 2024 • 24

WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs

Paper • 2406.18495 • Published Jun 26, 2024 • 13

WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences

Paper • 2406.11069 • Published Jun 16, 2024 • 14

WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild

Paper • 2406.04770 • Published Jun 7, 2024 • 28

upvoted a collection over 1 year ago

WildBench

Collection

4 items • Updated Dec 23, 2025 • 6

upvoted a paper over 1 year ago

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Paper • 2406.08464 • Published Jun 12, 2024 • 71

upvoted 3 papers almost 2 years ago

upvoted 2 papers about 2 years ago

The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning

Paper • 2312.01552 • Published Dec 4, 2023 • 32

Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs

Paper • 2311.05657 • Published Nov 9, 2023 • 30

upvoted a paper over 2 years ago

Mind2Web: Towards a Generalist Agent for the Web

Paper • 2306.06070 • Published Jun 9, 2023 • 20

Bill Yuchen Lin

AI & ML interests

Organizations

TinyV: Reducing False Negatives in Verification Improves RL for LLM Reasoning

ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning

Magpie Reasoning Datasets

On Memorization of Large Language Models in Logical Reasoning

MagpieLM

ZebraLogic: Benchmarking the Logical Reasoning Ability of Language Models

Magpie-Qwen2 Datasets

Zebra Logic Bench

The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism

WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs

WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences

WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild

WildBench

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents

Multi-LoRA Composition for Image Generation

SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding

The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning

Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs

Mind2Web: Towards a Generalist Agent for the Web

Bill Yuchen Lin

AI & ML interests

Organizations

yuchenlin's activity

ZebraLogic: Benchmarking the Logical Reasoning Ability of Language Models