--- license: other license_name: flux-1-dev-non-commercial-license license_link: LICENSE --- # FLUX.1 [dev] -- Flumina Server App (FP8 Version) This repository contains an implementation of the FLUX.1 [dev] [FP8 version](https://github.com/aredden/flux-fp8-api), which uses float8 numerics instead of bfloat16. This optimization leads to 2x faster performance in inference when compared to previous versions, making it ideal for high-speed, resource-efficient applications on Fireworks AI’s Flumina Server App toolkit. ![Example output](example.png) ## Getting Started -- Serverless deployment on Fireworks This FP8 Server App is deployed to Fireworks as-is in a "serverless" deployment, enabling you to leverage its performance boost without needing to manage servers manually. Grab an [API Key](https://fireworks.ai/account/api-keys) from Fireworks and set it in your environment variables: ```bash export API_KEY=YOUR_API_KEY_HERE ``` ### Text-to-Image Example Call ```bash curl -X POST 'https://api.fireworks.ai/inference/v1/workflows/accounts/fireworks/models/flux-1-dev-fp8/text_to_image' \ -H "Authorization: Bearer $API_KEY" \ -H "Content-Type: application/json" \ -H "Accept: image/jpeg" \ -d '{ "prompt": "Woman laying in the grass", "aspect_ratio": "16:9", "guidance_scale": 3.5, "num_inference_steps": 30, "seed": 0 }' \ --output output.jpg ``` ![Output of text-to-image](t2i_output.jpg) ## Deploying FLUX.1 [dev] to Fireworks On-Demand FLUX.1 [dev] (bfloat16) is available on Fireworks via [on-demand deployments](https://docs.fireworks.ai/guides/ondemand-deployments). It can be deployed in a few simple steps: ### Prerequisite: Install the Flumina CLI The Flumina CLI is included with the [fireworks-ai](https://pypi.org/project/fireworks-ai/) Python package. It can be installed with pip like so: ```bash pip install 'fireworks-ai[flumina]>=0.15.7' ``` Also get an API key from the [Fireworks site](https://fireworks.ai/account/api-keys) and set it in the Flumina CLI: ```bash flumina set-api-key YOURAPIKEYHERE ``` ### Creating an On-Demand Deployment `flumina deploy` can be used to create an on-demand deployment. When invoked with a model name that exists already, it will create a new deployment in your account which has that model: ```bash flumina deploy accounts/fireworks/models/flux-1-dev-fp8 ``` *Note that fp8 FLUX models require `--accelerator-type H100` to successfully deploy* When successful, the CLI will print out example commands to call your new deployment, for example: ```bash curl -X POST 'https://api.fireworks.ai/inference/v1/workflows/accounts/fireworks/models/flux-1-dev-fp8/text_to_image?deployment=accounts/u-6jamesr6-63834f/deployments/a0dab4ba' \ -H 'Authorization: Bearer API_KEY' \ -H "Content-Type: application/json" \ -d '{ "prompt": "", "aspect_ratio": "16:9", "guidance_scale": 3.5, "num_inference_steps": 30, "seed": 0 }' ``` Your deployment can also be administered using the Flumina CLI. Useful commands include: * `flumina list deployments` to show all of your deployments * `flumina get deployment` to get details about a specific deployment * `flumina delete deployment` to delete a deployment ## What is Flumina? Flumina is Fireworks.ai’s new system for hosting Server Apps that allows users to deploy deep learning inference to production in minutes, not weeks. ## What does Flumina offer for FLUX models? Flumina offers the following benefits: * Clear, precise definition of the server-side workload by looking at the server app implementation (you are here) * Extensibility interface, which allows for dynamic loading/dispatching of add-ons server-side. For FLUX: * ControlNet (Union) adapters * LoRA adapters * Off-the-shelf support for standing up on-demand capacity for the Server App on Fireworks * Further, customization of the logic of the deployment by modifying the Server App and deploying the modified version. * Now with support for FP8 numerics, delivering enhanced speed and efficiency for intensive workloads. ## Deploying Custom FLUX.1 [dev] FP8 Apps to Fireworks On-demand Coming soon!