Self-Training for Sample-Efficient Active Learning for Text Classification with Pre-Trained Language Models Paper β’ 2406.09206 β’ Published Jun 13 β’ 1
OpenCulture Collection A multilingual dataset of public domain books and newspapers. β’ 27 items β’ Updated 18 days ago β’ 117
EU20-Benchmarks Collection Evaluation Benchmarks for 20 European languages. β’ 5 items β’ Updated Oct 11 β’ 4
Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time Paper β’ 2408.13233 β’ Published Aug 23 β’ 21
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies Paper β’ 2407.13623 β’ Published Jul 18 β’ 52
RETVec: Resilient and Efficient Text Vectorizer Paper β’ 2302.09207 β’ Published Feb 18, 2023 β’ 3
LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs Paper β’ 2407.03963 β’ Published Jul 4 β’ 15
AnchorAL: Computationally Efficient Active Learning for Large and Imbalanced Datasets Paper β’ 2404.05623 β’ Published Apr 8 β’ 3
π§AI Podcasts and Talks! Collection π€Cool stuff to listen to at any time! β’ 10 items β’ Updated Oct 6, 2023 β’ 5
Small-Text: Active Learning for Text Classification in Python Paper β’ 2107.10314 β’ Published Jul 21, 2021 β’ 1