Kenneth Hamilton PRO

ZennyKenny

AI & ML interests

Development and Ops for LLMs and CV.

Recent Activity

updated a dataset about 1 hour ago
microsoft/orca-agentinstruct-1M-v1
New activity about 1 hour ago
microsoft/orca-agentinstruct-1M-v1
updated a Space about 14 hours ago
ZennyKenny/VocabSova

Organizations

ZennyKenny's activity

reacted to jsulz's post with πŸš€ about 14 hours ago
view post
Post
1931
In August, the XetHub team joined Hugging Face
- https://huggingface.co/blog/xethub-joins-hf - and we’ve been rolling up our sleeves to bring the best of both worlds together. We started with a deep dive into the current state of files stored with Git LFS on the Hub.

Getting this information was no small feat. We had to:
* Analyze a complete database dump of all repositories and files stored in Git LFS across Hugging Face.
* Parse through metadata on file sizes and types to accurately map the storage breakdown across Spaces, Models, and Datasets.

You can read more about the findings (with some jaw-dropping stats + charts) here https://www.linkedin.com/feed/update/urn:li:activity:7244486280351285248
reacted to davanstrien's post with πŸš€ 1 day ago
reacted to ArthurZ's post with πŸ”₯ 2 days ago
reacted to fdaudens's post with πŸš€ 4 days ago
view post
Post
1153
πŸͺ„ MagicQuill: AI that reads your mind for image edits! Point at what bugs you, and it suggests the perfect fixes. No more manual editing headaches. Try it here: AI4Editing/MagicQuill
posted an update 3 months ago
view post
Post
692
Very excited to have made the list and been invited to OpenAI DevDay 2024 at the London event 30 October! Looking forward to seeing what the future of AI dev holds, connecting with other professionals in the field, and advocating for open source AI!

https://openai.com/devday/
reacted to Taylor658's post with πŸ‘ 3 months ago
view post
Post
2345
πŸ’‘Andrew Ng recently gave a strong defense of Open Source AI models and the need to slow down legislative efforts in the US and the EU to restrict innovation in Open Source AI at Stanford GSB.

πŸŽ₯See video below
https://youtu.be/yzUdmwlh1sQ?si=bZc690p8iubolXm_
Β·
replied to Taylor658's post 3 months ago
view reply

As usual, Andrew Ng states the cogent position concisely and clearly for people who may not be familiar with the memes of the AI world.

Personally, I think some government committee or agency that focuses on AI could be a good thing, but having seen regulatory body after regulatory body in the United States fumble well meaning attempts to stay informed and turn those attempts into suffocating legislation, it seems that the only realistic position to advocate is no regulation whatsoever simply because any foot in the door oversight or law is simply going to be warped into red tape and bureaucracy based on the ever-changing winds of the election cycle.

replied to KingNish's post 3 months ago
replied to merve's post 4 months ago
reacted to merve's post with πŸ”₯ 4 months ago
reacted to severo's post with πŸš€ 4 months ago
view post
Post
3437
[New tool] Follow interesting ML persons πŸ‘©β€πŸŽ¨ πŸ‘¨β€πŸŽ€ πŸ‘©β€πŸ« with Followgraph

severo/followgraph

Please try it and tell me if it helped you discover high-quality content πŸ‘ πŸ‘Ž

I repurposed "Followgraph for Mastodon" (https://followgraph.vercel.app/).

My new follows: @TheBloke @mlabonne @teknium @KnutJaegersberg @SkalskiP @AmelieSchreiber @lbourdois @ceyda @andrewyng @Pclanglais @karpathy

And you?
Β·
reacted to nroggendorff's post with 😎 4 months ago
view post
Post
4080
Datasets are down, I offer a solution

git lfs install

git clone https://huggingface.co/datasets/{dataset/id}

from datasets import load_dataset

dataset = load_dataset("id")
reacted to qnguyen3's post with πŸ”₯ 5 months ago
reacted to dvilasuero's post with πŸš€ 5 months ago
view post
Post
7941
Today is a huge day in Argilla’s history. We couldn’t be more excited to share this with the community: we’re joining Hugging Face!

We’re embracing a larger mission, becoming part of a brilliant and kind team and a shared vision about the future of AI.

Over the past year, we’ve been collaborating with Hugging Face on countless projects: launching partner of Docker Spaces, empowering the community to clean Alpaca translations into Spanish and other languages, launching argilla/notus-7b-v1 building on Zephyr’s learnings, the Data is Better Together initiative with hundreds of community contributors, or releasing argilla/OpenHermesPreferences, one of the largest open preference tuning datasets

After more than 2,000 Slack messages and over 60 people collaborating for over a year, it already felt like we were part of the same team, pushing in the same direction. After a week of the smoothest transition you can imagine, we’re now the same team.

To those of you who’ve been following us, this won’t be a huge surprise, but it will be a big deal in the coming months. This acquisition means we’ll double down on empowering the community to build and collaborate on high quality datasets, we’ll bring full support for multimodal datasets, and we’ll be in a better place to collaborate with the Open Source AI community. For enterprises, this means that the Enterprise Hub will unlock highly requested features like single sign-on and integration with Inference Endpoints.

As a founder, I am proud of the Argilla team. We're now part of something bigger and a larger team but with the same values, culture, and goals. Grateful to have shared this journey with my beloved co-founders Paco and AmΓ©lie.

Finally, huge thanks to the Chief Llama Officer @osanseviero for sparking this and being such a great partner during the acquisition process.

Would love to answer any questions you have so feel free to add them below!
Β·
reacted to fdaudens's post with πŸ”₯ 6 months ago
view post
Post
1399
Impressed by the work of @guipenedo @hynky @loubnabnl @anton-l @craffel @lvwerra @thomwolf on FineWeb.

LLMs are only as good as the data they have been trained on, but the crucial aspect of pretraining data remains obscure. Our approach lifts the veil on building high-quality pretraining datasets by sharing every detail about this process to enable a wider community to build on top of it.

- The FineWeb-Edu dataset, which outperforms all openly accessible web datasets in a number of educational benchmarks. We built it by developing a quality classifier using annotations generated by an LLM.

- A new technical report explaining in detail how to create a large and high-quality web-scale dataset for LLM pretraining such as FineWeb

πŸ‘‰ HuggingFaceFW/blogpost-fineweb-v1
replied to alielfilali01's post 6 months ago
view reply

Great achievement, congratulations to the entire team!

reacted to alielfilali01's post with πŸ”₯ 6 months ago
view post
Post
1063
The 100 models milestone on the OALL/Open-Arabic-LLM-Leaderboard is successfully reached within 10 days after the leaderboard's release πŸ₯³

meta-llama/Meta-Llama-3-70B-Instruct is still the king of the leaderboard πŸ‘‘ with a 3.46 points difference compared to its successor CohereForAI/c4ai-command-r-plus who took the 2nd place πŸ₯ˆ from his younger brother CohereForAI/c4ai-command-r-v01 that lives today in the 5th floor just behind Ashmal/MBZUAI-oryx -3rd place πŸ₯‰- (AFAIK an experimental model from MBZUAI) and https://huggingface.co/core42/jais-30b-chat-v3 -4th place- from Core42.

PS : I should consider a career in sports commentary πŸ˜‚
Would you recommend me to BeIN Sports πŸ˜€ ?
  • 1 reply
Β·
reacted to davanstrien's post with πŸ”₯ 6 months ago
posted an update 6 months ago
view post
Post
1172
Thanks to the incredible collaboration of 14 community annotators, @davanstrien of HF and @dvilasuero et. al of Argilla, DIBT (https://huggingface.co/DIBT) is pleased to make available a Russian-language dataset of 500 of the best curated LLM prompts translated to Russian and available for use: https://huggingface.co/datasets/DIBT/MPEP_RUSSIAN.

More to come from the MPEP initiative! Interested in annotating or leading a language team? https://github.com/huggingface/data-is-better-together/tree/main/prompt_translation
  • 2 replies
Β·
reacted to davanstrien's post with πŸ”₯ 7 months ago
view post
Post
2254
Only 14 languages have DPO preference style datasets on the Hugging Face Hub (https://huggingface.co/spaces/DIBT/preference_data_by_language) Let's improve that! How?

The Cohere For AI Aya dataset CohereForAI/aya_dataset has human-annotated prompt-completion pairs in 71 languages. We can use this to create DPO datasets for more languages!

Using Aya's prompt/response pairs as a starting point we can use an LLM to generate an additional response to each prompt. We then use an LLM Judge to rank each response.

βœ… In some/many languages, human responses may be better than LLM ones but we may want to check that assumption for some languages.
πŸš€ We use Argilla's distilabel library to push data to Argilla for validation. This also allows us to determine if an LLM judge is effective for different languages.

As an example of what this pipeline produces:
- https://huggingface.co/datasets/DIBT/aya_dutch_dpo a DPO style dataset for Dutch using Llama 3 as a generator/judge LM.
- An annotation Space that anyone with a HF account can contribute to: https://dibt-demo-argilla-space.hf.space/dataset/924ef8a8-a447-4563-8806-0e2a668a5314/annotation-mode?page=1&status=pending

As part of Data is Better Together we want to build more DPO datasets. Join us here: https://github.com/huggingface/data-is-better-together#4-dpoorpo-datasets-for-more-languages πŸ€—