view article Article Introducing Observers: AI Observability with Hugging Face datasets through a lightweight SDK By davidberenstein1957 • about 8 hours ago • 10
Drowning in Documents: Consequences of Scaling Reranker Inference Paper • 2411.11767 • Published 3 days ago • 16
view article Article Halo: Open Source Health Tracking with Wearables By cyrilzakka • 2 days ago • 61
view article Article Releasing the largest multilingual open pretraining dataset By Pclanglais • 8 days ago • 94
Training with Prompts Collection See the Training with Prompts documentation for more details: https://sbert.net/examples/training/prompts/README.html • 5 items • Updated 14 days ago • 3
view article Article Releasing Common Corpus: the largest public domain dataset for training LLMs By Pclanglais • Mar 20 • 17
Model2Vec base models Collection These are the Minishlab Model2Vec base models. Load them and use them with model2vec (https://github.com/MinishLab/model2vec) or sentence-transformers • 7 items • Updated 23 days ago • 8
POTION Collection These are the flagship POTION models. Load them and use them with model2vec (https://github.com/MinishLab/model2vec) or sentence-transformers • 3 items • Updated 22 days ago • 6
view article Article Releasing Outlines-core 0.1.0: structured generation in Rust and Python about 1 month ago • 41
view article Article Transformers.js v3: WebGPU support, new models & tasks, and more… about 1 month ago • 63
Granite 3.0 Language Models Collection A series of language models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. • 8 items • Updated 17 days ago • 89
MedEmbed: Embedding Models for Medical Domain Collection GitHub -> https://github.com/abhinand5/MedEmbed • 4 items • Updated about 1 month ago • 7
view article Article MedEmbed: Fine-Tuned Embedding Models for Medical / Clinical IR By abhinand • Oct 20 • 30