RASMUS's picture
Training in progress, step 1000
878dbce

Hugging Face x Scikit-learn

In this sprint, we will build interactive demos from the scikit-learn documentation and, afterwards, contribute the demos directly to the docs.

Important Dates

🌅 Sprint Start Date: Apr 12, 2023 🌃 Sprint Finish Date: Apr 30, 2023

To get started 🤩

  1. Join our Discord and take the role #sklearn-sprint-participant by selecting "Sklearn Working Group" in the #role-assignment channel. Then, meet us in #sklearn-sprint channel.

  2. Head to this page and pick an example you’d like to build on.

  3. Leave a comment on this spreadsheet with your name under Owner column, claiming the example. The spreadsheet has a limited number of examples. Feel free to add yours with a comment if it doesn’t exist in the spreadsheet. .

  4. Start building!

    We will be hosting our applications in scikit-learn organization of Hugging Face.

    For complete starters: in the Hugging Face Hub, there are repositories for models, datasets, and Spaces. Spaces are a special type of repository hosting ML applications, such as showcasing a model. To write our apps, we will only be using Gradio. Gradio is a library that lets you build a cool front-end application for your models, completely in Python, and supports many libraries! In this sprint, we will be using mostly visualization support (matplotlib, plotly, altair and more) and skops integration (which you can launch an interface for a given classification or regression interface with one line of code).

    In Gradio, there are two ways to create a demo. One is to use Interface, which is a very simple abstraction. Let’s see an example.

    import gradio as gr
    
    # implement your classifier here 
    clf.fit(X_train, y_train)
    
    def cancer_classifier(df):
        # simply infer and return predictions
        predictions = clf.predict(df)
        return predictions
    
    gr.Interface(fn=cancer_classifier, inputs="dataframe", 
    outputs="label").launch()
    
    # save this in a file called app.py
    # then run it 
    

    This will result in following interface:

    Simple Interface

    This is very customizable. You can specify rows and columns, add a title and description, an example input, and more. There’s a more detailed guide here.

    Another way of creating an application is to use Blocks. You can see usage of Blocks in the example applications linked in this guide.

    After we create our application, we will create a Space. You can go to hf.co, click on your profile on top right and select “New Space”.

    New Space

    We can name our Space, pick a license and select Space SDK as “Gradio”. Free hardware is enough for our app, so no need to change it.

    Space Configuration

    After creating the Space, you have three options

    • You can clone the repository locally, add your files, and then push them to the Hub.
    • You can do all your coding directly in the browser.
    • (shown below) You can do the coding locally and then drag and drop your application file to the Hub.

    Space Config

    To upload your application file, pick “Add File” and drag and drop your file.

    New Space Landing

    Lastly, if your application includes any library other than Gradio, create a file called requirements.txt and add requirements like below:

    matplotlib==3.6.3
    scikit-learn==1.2.1
    

    And your app should be up and running!

    Example Submissions

    We left couple of examples below: (there’s more at the end of this page) Documentation page for comparing linkage methods for hierarchical clustering and example Space built on it 👇🏼

    Comparing different hierarchical linkage methods on toy datasets

    Hierarchical Clustering Linkage - a Hugging Face Space by scikit-learn

    Note: If for your demo you're training a model from scratch (e.g. training an image classifier), you can push it to the Hub using skops and build a Gradio demo on top of it. For such submission, we expect a model repository with a model card and the model weight as well as a simple Space with the interface that receives input and outputs results. You can use this tutorial to get started with skops.

    You can find an example submission for a model repository below.

    scikit-learn/cancer-prediction-trees · Hugging Face

  5. After the demos are done, we will open pull requests to scikit-learn documentation in scikit-learn’s repository to contribute our application codes to be directly inside the documentation. We will help you out if this is your first open source contribution. 🤗

 

If you need any help you can join our discord server, take collaborate role and join sklearn-sprint channel and ask questions 🤗🫂

Sprint Prizes

We will be giving following vouchers that can be spent at Hugging Face Store including shipping,

  • $20 worth of voucher for everyone that builds three demos,
  • $40 worth of voucher for everyone that builds five demos.