Data Is Better Together

community
Activity Feed

AI & ML interests

Building better datasets together

Recent Activity

data-is-better-together's activity

davanstrien 
posted an update about 12 hours ago
davanstrien 
posted an update 1 day ago
sayakpaul 
posted an update 2 days ago
view post
Post
1735
We have authored a post to go over the state of video generation in the Diffusers ecosystem 🧨

We cover the models supported, the knobs of optims our users can fire, fine-tuning, and more 🔥

5-6GBs for HunyuanVideo, sky is the limit 🌌 🤗
https://huggingface.co/blog/video_gen
davanstrien 
posted an update 2 days ago
view post
Post
1852
🌍 Big step for multilingual AI data!

The Hugging Face community has rated educational content in languages spoken by 1.6 billion people! New additions:
• Japanese
• Italian
• Old High German

Learn more and contribute: https://huggingface.co/blog/davanstrien/fineweb2-community

These ratings can help enhance training data for major world languages.
  • 1 reply
·
davidberenstein1957 
posted an update 3 days ago
burtenshaw 
posted an update 3 days ago
view post
Post
2218
Manic few days in open source AI, with game changing development all over the place. Here's a round up of the resources:

- The science team at @huggingface reproduced and open source the seek r1. https://github.com/huggingface/open-r1
- @qwen released a series of models with 1 million token context! https://qwenlm.github.io/blog/qwen2.5-1m/
- SmolVLM got even smaller with completely new variants at 256m and 500m https://huggingface.co/blog/smolervlm

There's so much you could do with these developments. Especially combining them together into agentic applications or fine-tuning them on your use case.
  • 1 reply
·
burtenshaw 
posted an update 5 days ago
view post
Post
627
Hey 👋

I'm helping out on some community research to learn about the AI community. If you want to join in the conversation, head over here where I started a community discussion on the most influential model since BERT.

OSAIResearchCommunity/README#2
burtenshaw 
posted an update 5 days ago
view post
Post
1403
📣 Teachers and Students! Here's a handy quiz app if you're preparing your own study material.

TLDR, It's a quiz that uses a dataset to make questions and save answers

Here's how it works:

- make a dataset of multiple choice questions
- duplicate the space add set the dataset repo
- log in and do the quiz
- submit the questions to create a new dataset

I made this to get ready for the agents course, but I hope it's useful for you projects too!

quiz app burtenshaw/dataset_quiz

dataset with questions burtenshaw/exam_questions

agents course we're working on https://huggingface.co/agents-course
burtenshaw 
posted an update 6 days ago
view post
Post
2054
AI was built on side projects!
burtenshaw 
posted an update 8 days ago
view post
Post
3459
🚧 Work in Progress! 🚧

👷‍♀️ We're working hard on getting the official agents course ready for the 50,000 students that have signed up.

If you want to contribute to the discussion, I started these community posts. Looking forward to hearing from you:

- smolagents unit in the agents course - agents-course/README#7
- LlamaIndex Unit in the agents course - agents-course/README#6
- LangChain and LangGraph unit in the agents course - agents-course/README#5
- Real world use cases in the agents course - agents-course/README#8


davidberenstein1957 
posted an update 8 days ago
davidberenstein1957 
posted an update 12 days ago
nataliaElv 
posted an update 13 days ago
view post
Post
1424
New chapter in the Hugging Face NLP course! 🤗 🚀

We've added a new chapter about the very basics of Argilla to the Hugging Face NLP course. Learn how to set up an Argilla instance, load & annotate datasets, and export them to the Hub. 

Any feedback for improvements welcome!

https://huggingface.co/learn/nlp-course/chapter10
burtenshaw 
posted an update 13 days ago