Layered Diffusion Brushes

Abstract

Denoising diffusion models have recently gained prominence as powerful tools for a variety of image generation and manipulation tasks. Building on this, we propose a novel tool for real-time editing of images that provides users with fine-grained region-targeted supervision in addition to existing prompt-based controls. Our novel editing technique, termed Layered Diffusion Brushes, leverages prompt-guided and region-targeted alteration of intermediate denoising steps, enabling precise modifications while maintaining the integrity and context of the input image. We provide an editor based on Layered Diffusion Brushes modifications, which incorporates well-known image editing concepts such as layer masks, visibility toggles, and independent manipulation of layers—regardless of their order. Our system renders a single edit on a 512x512 image within 140 ms using a high-end consumer GPU, enabling real-time feedback and rapid exploration of candidate edits. We validated our method and editing system through a user study involving both natural images (using inversion) and generated images, showcasing its usability and effectiveness compared to existing techniques such as InstructPix2Pix and Stable Diffusion Inpainting for refining images. Our approach demonstrates efficacy across a range of tasks, including object attribute adjustments, error correction, and sequential prompt-based object placement and manipulation, demonstrating its versatility and potential for enhancing creative workflows.

Method

Layered Diffusion Brushes utilize a Latent Diffusion Model-based approach for editing images. Our method is training-free and performs edits through making targeted adjustments to the intermediate latents of the input image during the diffusion process.

During the editing process, the algorithm combines an input new noisy latent 𝑆′ with the original latent from step 𝑛 using a mask 𝑚 and a strength control 𝛼. The denoising procedure then proceeds by applying the edit prompt over several steps. At a specific step 𝑡 (where 𝑡 is fixed as 𝑁 − 2), the original latent and the newly modified latent are merged with the intermediate latent from the previous layer using masking. The final edited image

Real-time performance

Our method leverages a highly optimized pipeline combined with intermediate latent caching to achieve real-time performance. Using Layered Diffusion Brushes, a single edit on a 512x512 image can be rendered within 140 ms (up to 7 fps) on a high-end consumer GPU. This capability enables real-time feedback and rapid exploration of candidate edits.

Layer-based image editing

Layered Diffusion Brushes enables powerful image layering and independent editing within each layer. Our editor offers two modes of editing: box mode for brush dragging and custom mask mode for mask scrolling.

Users can stack, hide, unhide, delete, and adjust the strength of layers independently, ensuring global consistency and flexibility in layer modifications regardless of order. This allows for precise and non-destructive editing across different layers.

Try it yourself

Create a new layer and select layer 1 from the menu. (Layer 0 is the background)
Select a prompt to edit the image.
Move the mouse cursor over the image and use scroll wheel (or swipe your fingers on the image if on a phone) to change the seed and get a new edit.
Create another layer for a new edit. In this demo, you may stack up to 3 layers of edits.
Try hiding, unhiding, or deleting layers.
You can toggle between the original image and edited image by holding Layer 0.

input prompt: "photo of paris in fall"

Create a layer to start.

Edit Prompt
lake	desert	waterfall

Edit Prompt
japan	italy	egypt

Edit Prompt
palm tree	blossom, spring	pine, snow

(for speed, we use pre-generated images in this demo.)

@misc{gholami2024streamlining, title={Streamlining Image Editing with Layered Diffusion Brushes}, author={Peyman Gholami and Robert Xiao}, year={2024}, eprint={2405.00313}, archivePrefix={arXiv}, primaryClass={cs.CV} }

Layered Diffusion Brushes

Our method enables real-time layer-based editing of images using latent diffusion models.

Abstract

Video

Layer 0 - input image

Layer 1 with box-brush option

Layer 1 output - boat

Layer 2 with mask-scroll option

Layer 2 output - rocks

Method

Real-time performance

Layer-based image editing

Try it yourself

BibTeX