Christopher Schröder's picture

15 19

Christopher Schröder

cschroeder

·

https://github.com/webis-de/small-text

AI & ML interests

NLP, Active Learning, Text Representations, PyTorch

Recent Activity

upvoted a paper about 2 months ago

The German Commons - 154 Billion Tokens of Openly Licensed Text for German Language Models

updated a model 4 months ago

small-text/tiny-distilroberta-base

published a model 4 months ago

small-text/tiny-distilroberta-base

View all activity

Organizations

upvoted a paper about 2 months ago

The German Commons - 154 Billion Tokens of Openly Licensed Text for German Language Models

Paper • 2510.13996 • Published Oct 15 • 8

upvoted 2 papers 9 months ago

EuroBERT: Scaling Multilingual Encoders for European Languages

Paper • 2503.05500 • Published Mar 7 • 79

NeoBERT: A Next-Generation BERT

Paper • 2502.19587 • Published Feb 26 • 38

upvoted a collection about 1 year ago

Models for dataset curation

9 items • Updated Dec 5, 2024 • 17

upvoted a paper about 1 year ago

Self-Training for Sample-Efficient Active Learning for Text Classification with Pre-Trained Language Models

Paper • 2406.09206 • Published Jun 13, 2024 • 1

upvoted 2 collections about 1 year ago

OpenCulture

A multilingual dataset of public domain books and newspapers. • 27 items • Updated Nov 6, 2024 • 131

EU20-Benchmarks

Evaluation Benchmarks for 20 European languages. • 5 items • Updated Oct 11, 2024 • 9

upvoted an article about 1 year ago

Article

AI Policy @🤗: Open ML Considerations in the EU AI Act

Jul 24, 2023

•

2

upvoted 5 papers over 1 year ago

Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time

Paper • 2408.13233 • Published Aug 23, 2024 • 24

Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies

Paper • 2407.13623 • Published Jul 18, 2024 • 56

RETVec: Resilient and Efficient Text Vectorizer

Paper • 2302.09207 • Published Feb 18, 2023 • 3

LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs

Paper • 2407.03963 • Published Jul 4, 2024 • 19

AnchorAL: Computationally Efficient Active Learning for Large and Imbalanced Datasets

Paper • 2404.05623 • Published Apr 8, 2024 • 3

upvoted a collection over 1 year ago

🎧AI Podcasts and Talks!

🤗Cool stuff to listen to at any time! • 10 items • Updated Oct 6, 2023 • 5

upvoted a paper over 1 year ago

Small-Text: Active Learning for Text Classification in Python

Paper • 2107.10314 • Published Jul 21, 2021 • 1