blab

university

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

mmarone authored a paper 13 days ago

mmBERT: A Modern Multilingual Encoder with Annealed Language Learning

orionweller authored a paper about 1 month ago

mmBERT: A Modern Multilingual Encoder with Annealed Language Learning

orionweller authored a paper about 1 month ago

On the Theoretical Limitations of Embedding-Based Retrieval

View all activity

mmarone

authored a paper 13 days ago

mmBERT: A Modern Multilingual Encoder with Annealed Language Learning

Paper • 2509.06888 • Published Sep 8 • 12

orionweller

authored 2 papers about 1 month ago

mmBERT: A Modern Multilingual Encoder with Annealed Language Learning

Paper • 2509.06888 • Published Sep 8 • 12

On the Theoretical Limitations of Embedding-Based Retrieval

Paper • 2508.21038 • Published Aug 28 • 19

mmarone

updated a model about 2 months ago

blab-jhu/mmbert-checkpoints

Updated Aug 24

orionweller

updated a model about 2 months ago

blab-jhu/mmbert-checkpoints

Updated Aug 24

orionweller

updated a dataset about 2 months ago

jhu-clsp/mmBERT-pretrain-p2-fineweb2-remaining

Updated 1 day ago • 5.3k

mmarone

authored a paper 3 months ago

Seq vs Seq: An Open Suite of Paired Encoders and Decoders

Paper • 2507.11412 • Published Jul 15 • 28

mmarone

authored a paper 6 months ago

Certified Mitigation of Worst-Case LLM Copyright Infringement

Paper • 2504.16046 • Published Apr 22 • 13

orionweller

authored a paper 7 months ago

Beyond RAG: Task-Aware KV Cache Compression for Comprehensive Knowledge Reasoning

Paper • 2503.04973 • Published Mar 6 • 25

mmarone

authored 4 papers 7 months ago

Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data

Paper • 2404.03862 • Published Apr 5, 2024

AdapterSwap: Continuous Training of LLMs with Data Removal and Access-Control Guarantees

Paper • 2404.08417 • Published Apr 12, 2024 • 1

Data Portraits: Recording Foundation Model Training Data

Paper • 2303.03919 • Published Mar 6, 2023

Dated Data: Tracing Knowledge Cutoffs in Large Language Models

Paper • 2403.12958 • Published Mar 19, 2024

orionweller

authored 7 papers 10 months ago

CLERC: A Dataset for Legal Case Retrieval and Retrieval-Augmented Analysis Generation

Paper • 2406.17186 • Published Jun 24, 2024 • 2

Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models

Paper • 2409.11136 • Published Sep 17, 2024 • 24

AI & ML interests

Recent Activity

Team members 5

blab-jhu's activity