Lj V. Miranda's picture

In a Training Loop 🔄

Lj V. Miranda PRO

ljvmiranda921

·

https://ljvmiranda921.github.io

AI & ML interests

NLP - multilinguality, data-centric AI

Recent Activity

updated a model about 2 hours ago

ljvmiranda921/msde-sft-dev

updated a dataset about 8 hours ago

ljvmiranda921/msde-T1-ar

updated a dataset about 9 hours ago

ljvmiranda921/msde-T1-id

View all activity

Organizations

upvoted 2 articles 4 months ago

Article

There is no such thing as a tokenizer-free lunch

Sep 25, 2025

•

95

Article

An Analysis of Multilingual Models on Hugging Face

Sep 18, 2025

•

4

upvoted an article 5 months ago

Article

🇵🇭 FilBench - Can LLMs Understand and Generate Filipino?

+7

Aug 12, 2025

•

22

upvoted a collection 7 months ago

Reward Bench 2

Datasets, spaces, and models for Reward Bench 2 benchmark and paper! • 11 items • Updated 17 days ago • 16

upvoted 2 papers 8 months ago

R3: Robust Rubric-Agnostic Reward Models

Paper • 2505.13388 • Published May 19, 2025 • 11

Reinforcement Learning for Reasoning in Large Language Models with One Training Example

Paper • 2504.20571 • Published Apr 29, 2025 • 98

upvoted a paper 9 months ago

The Bitter Lesson Learned from 2,000+ Multilingual Benchmarks

Paper • 2504.15521 • Published Apr 22, 2025 • 64

upvoted a collection 10 months ago

SEA-VL: Multicultural VL Dataset for Southeast Asia

Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia • 3 items • Updated Apr 12, 2025 • 20

upvoted a paper 10 months ago

Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia

Paper • 2503.07920 • Published Mar 10, 2025 • 101

upvoted 3 papers about 1 year ago

Bridging the Data Provenance Gap Across Text, Speech and Video

Paper • 2412.17847 • Published Dec 19, 2024 • 10

2 OLMo 2 Furious

Paper • 2501.00656 • Published Dec 31, 2024 • 22

Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 376

upvoted 3 collections about 1 year ago

Multilingual LLM Evaluation

Multilingual Evaluation Benchmarks • 8 items • Updated Jul 31, 2025 • 28

SEACrowd: A Multilingual Multimodal Data Hub and Benchmark S

SEACrowd is a community movement project aimed at centralizing and standardizing AI resources for Southeast Asian languages, cultures, and/or regions. • 3 items • Updated Jun 18, 2024 • 8

OLMo 2

Artifacts for the OLMo 2 release. • 35 items • Updated 17 days ago • 151

upvoted a paper about 1 year ago

TÜLU 3: Pushing Frontiers in Open Language Model Post-Training

Paper • 2411.15124 • Published Nov 22, 2024 • 67

upvoted a collection about 1 year ago

Tulu 3 Datasets

All datasets released with Tulu 3 -- state of the art open post-training recipes. • 33 items • Updated 17 days ago • 96

upvoted a paper about 1 year ago

Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback

Paper • 2410.19133 • Published Oct 24, 2024 • 11

upvoted a collection about 1 year ago

Multilingual RewardBench (M-RewardBench) [ACL 2025 Main]

Multilingual Reward Model Evaluation Dataset and Results • 3 items • Updated May 15, 2025 • 4

upvoted a paper about 1 year ago

M-RewardBench: Evaluating Reward Models in Multilingual Settings

Paper • 2410.15522 • Published Oct 20, 2024 • 12