Voxel51 (Voxel51)

🎉 SUPER BLACK FRIDAY DEAL 🎉

Train almost any model on a variety of tasks such as llm finetuning, text classification/regression, summarization, question answering, image classification/regression, object detection, tabular data, etc for FREE using AutoTrain locally. 🔥
https://github.com/huggingface/autotrain-advanced

abhishek

posted an update 2 months ago

Post

5680

INTRODUCING Hugging Face AutoTrain Client 🔥
Fine-tuning models got even easier!!!!
Now you can fine-tune SOTA models on all compatible dataset-model pairs on Hugging Face Hub using Python on Hugging Face Servers. Choose from a number of GPU flavors, millions of models and dataset pairs and 10+ tasks 🤗

To try, install autotrain-advanced using pip. You can ignore dependencies and install without --no-deps and then you'd need to install some dependencies by hand.

"pip install autotrain-advanced"

Github repo: https://github.com/huggingface/autotrain-advanced

6 replies

·

abhishek

authored a paper 3 months ago

AutoTrain: No-code training for state-of-the-art models

Paper • 2410.15735 • Published Oct 21, 2024 • 59

abhishek

posted an update 3 months ago

Post

4393

AutoTrain: No-code training for state-of-the-art models (2410.15735)

abhishek

posted an update 5 months ago

Post

1506

NEW COMPETITION ALERT 🚀
Artificio/ROAM1RealWorldAdversarialAttack

abhishek

posted an update 5 months ago

Post

1859

🚨 NEW TASK ALERT 🚨
Extractive Question Answering: because sometimes generative is not all you need 😉
AutoTrain is the only open-source, no code solution to offer so many tasks across different modalities. Current task count: 23 🚀
Check out the blog post on getting started with this task: https://huggingface.co/blog/abhishek/extractive-qa-autotrain

harpreetsahota

posted an update 8 months ago

Post

2165

The Coachella of Computer Vision, CVPR, is right around the corner. In anticipation of the conference, I curated a dataset of the papers.

I'll have a technical blog post out tomorrow doing some analysis on the dataset, but I'm so hyped that I wanted to get it out to the community ASAP.

The dataset consists of the following fields:

- An image of the first page of the paper
- title: The title of the paper
- authors_list: The list of authors
- abstract: The abstract of the paper
- arxiv_link: Link to the paper on arXiv
- other_link: Link to the project page, if found
- category_name: The primary category this paper according to [arXiv taxonomy](https://arxiv.org/category_taxonomy)
- all_categories: All categories this paper falls into, according to arXiv taxonomy
- keywords: Extracted using GPT-4o

Here's how I created the dataset 👇🏼

Generic code for building this dataset can be found [here](https://github.com/harpreetsahota204/CVPR-2024-Papers).

This dataset was built using the following steps:

- Scrape the CVPR 2024 website for accepted papers
- Use DuckDuckGo to search for a link to the paper's abstract on arXiv
- Use arXiv.py (python wrapper for the arXiv API) to extract the abstract and categories, and download the pdf for each paper
- Use pdf2image to save the image of paper's first page
- Use GPT-4o to extract keywords from the abstract

Voxel51/CVPR_2024_Papers

abhishek

posted an update 8 months ago

Post

3320

You can now train/finetune custom sentence transformer embedding models using AutoTrain. Read blog: https://huggingface.co/blog/abhishek/finetune-custom-embeddings-autotrain

2 replies

·

abhishek

posted an update 8 months ago

Post

2938

🚨 NEW TASK ALERT 🚨
🎉 AutoTrain now supports Object Detection! 🎉
Transform your projects with these powerful new features:
🔹 Fine-tune any supported model from the Hugging Face Hub
🔹 Seamless logging with TensorBoard or W&B
🔹 Support for local and hub datasets
🔹 Configurable training for tailored results
🔹 Train locally or leverage Hugging Face Spaces
🔹 Deployment-ready with API inference or Hugging Face endpoints
AutoTrain: https://hf.co/autotrain

abhishek

posted an update 9 months ago

Post

3066

🚀🚀🚀🚀 Introducing AutoTrain Configs! 🚀🚀🚀🚀
Now you can train models using yaml config files! 💥 These configs are easy to understand and are not at all overwhelming. So, even a person with almost zero knowledge of machine learning can train state of the art models without writing any code. Check out example configs in the config directory of autotrain-advanced github repo and feel free to share configs by creating a pull request 🤗
Github repo: https://github.com/huggingface/autotrain-advanced

2 replies

·

abhishek

posted an update 9 months ago

Post

3073

How to Finetune phi-3 on MacBook Pro
https://huggingface.co/blog/abhishek/phi3-finetune-macbook

abhishek

posted an update 9 months ago

Post

2372

Trained another version of llama3-8b-instruct which beats the base model. This time without losing too many points on gsm8k benchmark. Again, using AutoTrain 💥 pip install autotrain-advanced
Trained model: abhishek/autotrain-llama3-orpo-v2

1 reply

·

abhishek

posted an update 9 months ago

Post

3478

With AutoTrain, you can already finetune the latest llama3 models without writing a single line of code. Here's an example finetune of llama3 8b model: abhishek/autotrain-llama3-no-robots

2 replies

·

jamarks

posted an update 9 months ago

Post

2177

FiftyOne Datasets <> Hugging Face Hub Integration!

As of yesterday's release of FiftyOne 0.23.8, the FiftyOne open source library for dataset curation and visualization is now integrated with the Hugging Face Hub!

You can now load Parquet datasets from the hub and have them converted directly into FiftyOne datasets. To load MNIST, for example:

pip install -U fiftyone

import fiftyone as fo
import fiftyone.utils.huggingface as fouh

dataset = fouh.load_from_hub(
    "mnist",
    format="ParquetFilesDataset",
    classification_fields="label",
)
session = fo.launch_app(dataset)

You can also load FiftyOne datasets directly from the hub. Here's how you load the first 1000 samples from the VisDrone dataset:

import fiftyone as fo
import fiftyone.utils.huggingface as fouh

dataset = fouh.load_from_hub("jamarks/VisDrone2019-DET", max_samples=1000)

# Launch the App
session = fo.launch_app(dataset)

And tying it all together, you can push your FiftyOne datasets directly to the hub:

import fiftyone.zoo as foz
import fiftyone.utils.huggingface as fouh

dataset = foz.load_zoo_dataset("quickstart")
fouh.push_to_hub(dataset, "my-dataset")

Major thanks to @tomaarsen @davanstrien @severo @osanseviero and @julien-c for helping to make this happen!!!

Full documentation and details here: https://docs.voxel51.com/integrations/huggingface.html#huggingface-hub

3 replies

·

harpreetsahota

posted an update 11 months ago

Post

google/gemma-7b-it is super good!

I wasn't convinced at first, but after vibe-checking it...I'm quite impressed.

I've got a notebook here, which is kind of a framework for vibe-checking LLMs.

In this notebook, I take Gemma for a spin on a variety of prompts:
• [nonsensical tokens]( harpreetsahota/diverse-token-sampler
• [conversation where I try to get some PII)( harpreetsahota/red-team-prompts-questions)
• [summarization ability]( lighteval/summarization)
• [instruction following]( harpreetsahota/Instruction-Following-Evaluation-for-Large-Language-Models
• [chain of thought reasoning]( ssbuild/alaca_chain-of-thought)

I then used LangChain evaluators (GPT-4 as judge), and track everything in LangSmith. I made public links to the traces where you can inspect the runs.

I hope you find this helpful, and I am certainly open to feedback, criticisms, or ways to improve.

Cheers:

You can find the notebook here: https://colab.research.google.com/drive/1RHzg0FD46kKbiGfTdZw9Fo-DqWzajuoi?usp=sharing

Voxel51

AI & ML interests

Recent Activity

Voxel51's activity

Voxel51/GMNCSA24-FO

Voxel51/Thermal-Person-Detector

Added information about splits and provided metadata

Voxel51/getting-started-labeled-validation

Voxel51/getting-started-labeled-photos

AutoTrain: No-code training for state-of-the-art models

AI & ML interests

Recent Activity

Team members 17

Voxel51's activity

Added information about splits and provided metadata