20240818-flux_lora_training.jpeg

<aside> 🎨 Training a LoRA for flux.1-dev and flux.1-schnell on a 24GB GPU and image generation using ComfyUI

</aside>

Revision: 20250126-0 (init: 20240818)

The following uses the https://github.com/ostris/ai-toolkit GitHub repository to train a local LoRA on user-provided images, which we can use to generate pictures using ComfyUI.

Running this training requires an Nvidia GPU with 24GB of Video RAM (VRAM).

We will train on Ubuntu 24.04 with a recent Nvidia driver installed, git, brew (to install useful commands), and Python (python3 with pip3 and the venv package installed, either via apt or brew)

Recent developments in the Flux model ecosystem include advancements in FP8 and NP4 quantization formats and enhancements in using LoRA. Earlier this week, source code to enable the training of a Flux LoRA on user-provided images was announced: https://github.com/ostris/ai-toolkit

We will use it to create a LoRA for Flux.1-Dev and Flux.1-Schnell. We will then exercise the trained LoRA with ComfyUI (workflows embedded in the generated images) to train the models.

Requirements:

Some of those topics were covered in “FLUX.1[dev] with ComfyUI and Stability Matrix”

FLUX.1[dev] with ComfyUI and Stability Matrix (Rev: 20250322-0)

In the following, we will train on the tok prompt trigger word and incorporate it into our files and directory names.

We will train our LoRA to 4000 steps (you are not required to go this far; good results can be obtained at lower settings, such as 2000). From a previous test, this takes over 4 hours on an NVIDIA RTX 3090 and about 2h20 on an RTX 4090.

During training, if the Linux system has a Windows Manager running, a web browser, or any service (on Dockge, for example, Ollama or others) that uses the GPU, it is required to terminate as many of those as possible. Use nvidia-smi to check what uses the GPU’s memory and reduce the VRAM consumption before training. Most of the following steps can be done over ssh in a tmux terminal (training can take a few hours; being able to re-attach to a session will be helpful) and in a Visual Studio Code Remote connection.

During a test run on an Ubuntu Desktop system accessed remotely, only two processes (Xorg and gnome-shell) were present, and nvidia-smi only listed 141MB of our 24GB VRAM GPU).

Preliminary steps

Code retrieval and virtualenv creation

<aside> ⚠️

If you use brew, your system might have Python 3.13 as your default. However, it is not compatible (yet) with PyTorch and other requirements, so we will be using python3.12 directly for some of the following commands.

</aside>

Obtain the source code, create a virtual environment, and install the requirements.txt.

We will use a base directory called Flux_Lora in the home (~) of our user account, where we will place the required components.

mkdir ~/Flux_Lora
cd ~/Flux_Lora

# Obtain the source and the code's submodules
git clone <https://github.com/ostris/ai-toolkit.git>
cd ai-toolkit
git submodule update --init --recursive

# Create and enable a python virtual environment to place all the needed packages
python3.12 -m venv venv
source venv/bin/activate
# for future use, re-enabling the venv can be done by running the source command again

# Confirm python3 and pip3 are from our venv
which python3
which pip3

# Install the required packages
pip3 install torch
pip3 install -r requirements.txt
pip3 install mediapipe peft 
# mediapipe appears to be needed for Dev, while peft will be needed for Schnell

Hugging Face token

Downloading content from HuggingFace.co (HB) requires a read token. Content from HF will be placed within the value of the HF_HOME environment variable, which by default is ~/.cache/huggingface. This environment variable can be altered to match your preferences; see https://huggingface.co/docs/huggingface_hub/en/package_reference/environment_variables#hfhome for details. We will use the default to maximize caching opportunities.

We will require a token for HF to log in. See https://huggingface.co/docs/hub/security-tokens for details on how to get this “download-only” read token.

We then use the huggingface-cli to set our token, per https://huggingface.co/docs/huggingface_hub/en/guides/cli#huggingface-cli-login

# Install the CLI using brew
brew install huggingface-cli

# Confirm the token is valid and add it to the default HF_HOME at ~/.cache/huggingface
huggingface-cli login
# Answer no to "Add token as git credential" as this is a "download-only" token
# This will store the token in ~/.cache/huggingface/token
# All models retrieved from HF's hub will end up in ~/.cache/huggingface/hub

FLUX.1-dev is hosted on HF, with the note, “This repository is publicly accessible, but you have to accept the conditions to access its files and content.” If you intend to use this model for training, follow the steps detailed on https://huggingface.co/black-forest-labs/FLUX.1-dev to accept the terms. If the terms are not accepted, the model will not be accessible.

FLUX.1-schnell uses a different license. See https://huggingface.co/black-forest-labs/FLUX.1-schnell for details.

The HF token's availability will prove useful for miscellaneous downloads of side files from HF as they might occur.

Dataset preparation

Prepare a folder for a set of image files:

mkdir ~/Flux_Lora/training_images-tok

When using this folder during training, the tool will create a _latent_cache folder to store .safetensors characterizing the images upon which to train.

Place your images in this location; the recommended image size is 1024x1024. The script requires them to be in .jpg or .png format. During its preliminary steps, the tool will resize them for processing size, with a maximum per-side pixel size of 512, 768, and 1024.

When training on a person, it is recommended to:

LoRA Training

Flux.1-Dev

The training step requires the adaptation of a configuration file. For flux.1-dev, an example configuration file is in ~/Flux_Lora/ai-toolkit/config/examples/train_lora_flux_24gb.yaml

cd  ~/Flux_Lora/ai-toolkit
# create an output directory for the trained LoRA(s) 
mkdir ../trained_LoRA
# copy the example training yaml and rename it
cp config/examples/train_lora_flux_24gb.yaml config/flux_dev-tok.yaml

Edit ~/Flux_Lora/ai-toolkit/config/flux_dev-tok.yaml with your preferred editor (we will use VSCode with remote SSH into the system to limit the use of additional VRAM). The configuration file supports relative paths (../../directory but not the ~ character—representing the user’s home directory), so we will specify folders relative to the ai-toolkit directory in our configuration file.

The comments in the YAML file inform us of the expected use case for each entry in the configuration file. As such, here are the limited changes we made:

For reference, the entire configuration file is available at flux_dev-tok.yaml

The next step consists of starting and waiting for the training to complete. In a terminal on our tmux:

# Make sure you are running the create virtual env
# if needed: cd ~/Flux_Lora/ai-toolkit; source venv/bin/activate
time python3 ./run.py config/flux_dev-tok.yaml

The tool will validate the yaml file, download required models (over 30GB), generate latent content from the training images, and start training.

To see CPU and GPU usage, we can use nvitop (usable from pipx, itself installable using brew install pipx). In a new tmux terminal: pipx run nvitop

4090DevTraining.png

During training, it is possible to see the sample image being generated at each sample_every step. Those will be in ~/Flux_Lora/trained_LoRA/flux_dev-tok/samples with file names such as 1723999194004__000000000_0.jpg decomposed as “unix timestamp __ steps performed _ prompt number”. Being “step 0” this image is one of the initial “before training” samples, and being “prompt 0”, it is the one that we expect to be consistent from generation to generation (a woman holding a coffee cup […])

For each save_every step, a new .safetensors file is stored in the output directory. This is a usable LoRA. For example, at step 250, we get flux_dev-tok_000000250.safetensors. Based on the samples generated, we can cancel training early if a given save provides better results than another.


Untitled

Untitled