Safetensors
File size: 4,231 Bytes
d391c5d
 
 
 
 
d9aea20
d391c5d
d9aea20
d391c5d
efb953a
d391c5d
50c4b9f
d391c5d
50c4b9f
d391c5d
efb953a
d391c5d
b6617b1
 
d391c5d
b6617b1
 
d391c5d
d9aea20
 
d391c5d
 
 
 
 
 
 
 
 
 
 
 
d9aea20
 
d391c5d
9dc5b0b
d391c5d
9dc5b0b
d391c5d
9dc5b0b
d391c5d
d9aea20
d391c5d
d9aea20
d391c5d
d9aea20
 
d391c5d
d9aea20
 
d391c5d
d9aea20
 
d391c5d
 
 
d9aea20
 
d391c5d
d9aea20
 
d391c5d
d9aea20
d391c5d
d9aea20
d391c5d
 
 
 
 
 
 
 
 
 
 
d9aea20
 
d391c5d
 
 
 
d9aea20
d391c5d
d9aea20
d391c5d
d9aea20
d391c5d
9dc5b0b
d391c5d
9dc5b0b
d391c5d
 
 
 
 
 
 
9dc5b0b
d391c5d
9dc5b0b
d391c5d
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
---
license: other
license_name: flux-1-dev-non-commercial-license
license_link: LICENSE
---

# FLUX.1 [dev] -- Flumina Server App (FP8 Version)

This repository contains an implementation of the FLUX.1 [dev] [FP8 version](https://github.com/aredden/flux-fp8-api), which uses float8 numerics instead of bfloat16. This optimization leads to 2x faster performance in inference when compared to previous versions, making it ideal for high-speed, resource-efficient applications on Fireworks AI’s Flumina Server App toolkit.

![Example output](example.png)

## Getting Started -- Serverless deployment on Fireworks

This FP8 Server App is deployed to Fireworks as-is in a "serverless" deployment, enabling you to leverage its performance boost without needing to manage servers manually.

Grab an [API Key](https://fireworks.ai/account/api-keys) from Fireworks and set it in your environment variables:

```bash
export API_KEY=YOUR_API_KEY_HERE
```

### Text-to-Image Example Call

```bash
curl -X POST 'https://api.fireworks.ai/inference/v1/workflows/accounts/fireworks/models/flux-1-dev-fp8/text_to_image' \
    -H "Authorization: Bearer $API_KEY" \
    -H "Content-Type: application/json" \
    -H "Accept: image/jpeg" \
    -d '{
        "prompt": "Woman laying in the grass",
        "aspect_ratio": "16:9",
        "guidance_scale": 3.5,
        "num_inference_steps": 30,
        "seed": 0
    }' \
    --output output.jpg
```

![Output of text-to-image](t2i_output.jpg)

## Deploying FLUX.1 [dev] to Fireworks On-Demand

FLUX.1 [dev] (bfloat16) is available on Fireworks via [on-demand deployments](https://docs.fireworks.ai/guides/ondemand-deployments). It can be deployed in a few simple steps:

### Prerequisite: Install the Flumina CLI

The Flumina CLI is included with the [fireworks-ai](https://pypi.org/project/fireworks-ai/) Python package. It can be installed with pip like so:
```bash
pip install 'fireworks-ai[flumina]>=0.15.7'
```

Also get an API key from the [Fireworks site](https://fireworks.ai/account/api-keys) and set it in the Flumina CLI:

```bash
flumina set-api-key YOURAPIKEYHERE
```

### Creating an On-Demand Deployment

`flumina deploy` can be used to create an on-demand deployment. When invoked with a model name that exists already, it will create a new deployment in your account which has that model:

```bash
flumina deploy accounts/fireworks/models/flux-1-dev-fp8
```

*Note that fp8 FLUX models require `--accelerator-type H100` to successfully deploy*

When successful, the CLI will print out example commands to call your new deployment, for example:

```bash
curl -X POST 'https://api.fireworks.ai/inference/v1/workflows/accounts/fireworks/models/flux-1-dev-fp8/text_to_image?deployment=accounts/u-6jamesr6-63834f/deployments/a0dab4ba' \
    -H 'Authorization: Bearer API_KEY' \
    -H "Content-Type: application/json" \
    -d '{
        "prompt": "<value>",
        "aspect_ratio": "16:9",
        "guidance_scale": 3.5,
        "num_inference_steps": 30,
        "seed": 0
    }'
```

Your deployment can also be administered using the Flumina CLI. Useful commands include:
* `flumina list deployments` to show all of your deployments
* `flumina get deployment` to get details about a specific deployment
* `flumina delete deployment` to delete a deployment

## What is Flumina?

Flumina is Fireworks.ai’s new system for hosting Server Apps that allows users to deploy deep learning inference to production in minutes, not weeks.

## What does Flumina offer for FLUX models?

Flumina offers the following benefits:

* Clear, precise definition of the server-side workload by looking at the server app implementation (you are here)
* Extensibility interface, which allows for dynamic loading/dispatching of add-ons server-side. For FLUX:
  * ControlNet (Union) adapters
  * LoRA adapters
* Off-the-shelf support for standing up on-demand capacity for the Server App on Fireworks
  * Further, customization of the logic of the deployment by modifying the Server App and deploying the modified version.
* Now with support for FP8 numerics, delivering enhanced speed and efficiency for intensive workloads.

## Deploying Custom FLUX.1 [dev] FP8 Apps to Fireworks On-demand

Coming soon!