Andrea Soria

asoria

AI & ML interests

Maintainer of πŸ€—Datasets: Data processing

Recent Activity

updated a dataset 32 minutes ago
asoria/crawl4ai_repo
liked a dataset about 23 hours ago
katossky/wine-recognition
liked a dataset about 23 hours ago
duckdb-nsql-hub/sql-console-prompt
View all activity

Articles

Organizations

Hugging Face's profile picture BigScience Data's profile picture Datasets Maintainers's profile picture Blog-explorers's profile picture Enterprise Explorers's profile picture ZeroGPU Explorers's profile picture Datasets examples's profile picture Women on Hugging Face's profile picture AI Developers from Latin America's profile picture Datasets Topics's profile picture

asoria's activity

upvoted an article 14 days ago
upvoted an article about 1 month ago
view article
Article

LoRA training scripts of the world, unite!

β€’ 45
upvoted 2 articles about 2 months ago
view article
Article

Improving Parquet Dedupe on Hugging Face Hub

β€’ 31
upvoted 4 articles 2 months ago
view article
Article

Introducing BERTopic Integration with Hugging Face Hub

β€’ 7
view article
Article

Introducing Idefics2: A Powerful 8B Vision-Language Model for the community

β€’ 168
view article
Article

Introducing the SQL Console on Datasets

β€’ 20
upvoted 2 articles 3 months ago
view article
Article

Fine-Tuning Gemma Models in Hugging Face

β€’ 25
view article
Article

The 5 Most Under-Rated Tools on Hugging Face

β€’ 85
upvoted 4 articles 4 months ago
view article
Article

SmolLM - blazingly fast and remarkably powerful

β€’ 279
view article
Article

Docmatix - a huge dataset for Document Visual Question Answering

β€’ 68
view article
Article

Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models

β€’ 67
view article
Article

Ethics and Society Newsletter #6: Building Better AI: The Importance of Data Quality

β€’ 33
upvoted 2 articles 5 months ago
view article
Article

Experimenting with Automatic PII Detection on the Hub using Presidio

β€’ 24
view article
Article

Announcing New Dataset Search Features

β€’ 22
upvoted 2 articles 6 months ago
view article
Article

How to directly access 150k+ Hugging Face Datasets with DuckDB and query using GPT-4o

By chilijung β€’
β€’ 11
view article
Article

Synthetic dataset generation techniques: generating custom sentence similarity data

By davanstrien β€’
β€’ 15
upvoted 2 articles 7 months ago
view article
Article

Synthetic data: save money, time and carbon with open source

β€’ 51
view article
Article

πŸ¦™βš—οΈ Using Llama3 and distilabel to build fine-tuning datasets

By dvilasuero β€’
β€’ 73