LLLeo Li's picture

LLLeo Li

LLLeo612

·

AI & ML interests

None yet

Recent Activity

upvoted a paper 3 months ago

Lost in Stories: Consistency Bugs in Long Story Generation by LLMs

upvoted a paper 4 months ago

LOCA-bench: Benchmarking Language Agents Under Controllable and Extreme Context Growth

liked a dataset 5 months ago

JingkunAn/TraceSpatial-Bench

View all activity

Organizations

upvoted a paper 3 months ago

Lost in Stories: Consistency Bugs in Long Story Generation by LLMs

Paper • 2603.05890 • Published Mar 6 • 93

upvoted a paper 4 months ago

LOCA-bench: Benchmarking Language Agents Under Controllable and Extreme Context Growth

Paper • 2602.07962 • Published Feb 8 • 24

liked a dataset 5 months ago

JingkunAn/TraceSpatial-Bench

Viewer • Updated Jan 7 • 100 • 957 • 5

updated a model 6 months ago

LLLeo612/MyAwesomeModel-TestRepo

Feature Extraction • Updated Dec 11, 2025 • 1

published a model 6 months ago

LLLeo612/MyAwesomeModel-TestRepo

Feature Extraction • Updated Dec 11, 2025 • 1

upvoted a paper 6 months ago

Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning

Paper • 2512.07461 • Published Dec 8, 2025 • 80

upvoted a paper 8 months ago

LLMs Learn to Deceive Unintentionally: Emergent Misalignment in Dishonesty from Misaligned Samples to Biased Human-AI Interactions

Paper • 2510.08211 • Published Oct 9, 2025 • 23

reacted to AdinaY's post with 🔥 8 months ago

Post

3544

BAAI has released ROME🔥 evaluating 30+ large reasoning models on text & visual reasoning

FlagEval Findings Report: A Preliminary Evaluation of Large Reasoning Models on Automatically Verifiable Textual and Visual Questions (2509.17177)

✨Tests visual reasoning, not just recognition
✨Covers capability × alignment × safety × efficiency
✨More transparent & reliable (less data contamination)
✨Helps make real-world deployment choices

upvoted a paper 8 months ago

FlagEval Findings Report: A Preliminary Evaluation of Large Reasoning Models on Automatically Verifiable Textual and Visual Questions

Paper • 2509.17177 • Published Sep 21, 2025 • 13

liked a dataset 11 months ago

JingkunAn/RefSpatial

Viewer • Updated Feb 4 • 800 • 1.84k • 21

upvoted 2 papers 12 months ago

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Paper • 2506.05176 • Published Jun 5, 2025 • 83

RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics

Paper • 2506.04308 • Published Jun 4, 2025 • 43

upvoted 2 papers about 1 year ago

Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26, 2025 • 173

Personalize Anything for Free with Diffusion Transformer

Paper • 2503.12590 • Published Mar 16, 2025 • 44

authored a paper over 1 year ago

AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement

Paper • 2502.16776 • Published Feb 24, 2025 • 6

upvoted 2 papers over 1 year ago

AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement

Paper • 2502.16776 • Published Feb 24, 2025 • 6

Thus Spake Long-Context Large Language Model

Paper • 2502.17129 • Published Feb 24, 2025 • 73

liked a model over 1 year ago

Goodfire/Llama-3.1-8B-Instruct-SAE-l19

Updated Jan 11, 2025 • 92 • 43

New activity in SafeMTData/SafeMTData over 1 year ago

[bot] Conversion to Parquet

#1 opened over 1 year ago by

parquet-converter

authored a paper over 1 year ago

VLSBench: Unveiling Visual Leakage in Multimodal Safety

Paper • 2411.19939 • Published Nov 29, 2024 • 10