π§ Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community β’ 24 items β’ Updated May 19 β’ 174
view article Article Introducing Idefics2: A Powerful 8B Vision-Language Model for the community Apr 15, 2024 β’ 190
view article Article Docmatix - a huge dataset for Document Visual Question Answering Jul 18, 2024 β’ 78
view article Article Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models Mar 20, 2024 β’ 105
view article Article Ethics and Society Newsletter #6: Building Better AI: The Importance of Data Quality Jun 24, 2024 β’ 34
view article Article Experimenting with Automatic PII Detection on the Hub using Presidio Jul 10, 2024 β’ 26
view article Article How to directly access 150k+ Hugging Face Datasets with DuckDB and query using GPT-4o May 31, 2024 β’ 11
view article Article Synthetic dataset generation techniques: generating custom sentence similarity data May 23, 2024 β’ 16