Exploring Low-Dimensional Subspaces in Diffusion Models for Controllable Image Editing

Department of Electrical and Computer Engineering, University of Michigan
NeurIPS 2024

*Indicates Equal Contribution
MY ALT TEXT

"LOCO Edit allows localized editing that is homogeneous, composable, and linear."

Abstract

Recently, diffusion models have emerged as a powerful class of generative models. Despite their success, there is still limited understanding of their semantic spaces. This makes it challenging to achieve precise and disentangled image generation without additional training, especially in an unsupervised way. In this work, we improve the understanding of their semantic spaces from intriguing observations: among a certain range of noise levels, (1) the learned posterior mean predictor (PMP) in the diffusion model is locally linear, and (2) the singular vectors of its Jacobian lie in low-dimensional semantic subspaces. We provide a solid theoretical basis to justify the linearity and low-rankness in the PMP. These insights allow us to propose an unsupervised, single-step, training-free LOw-rank COntrollable image editing (LOCO Edit) method for precise local editing in diffusion models. LOCO Edit identified editing directions with nice properties: homogeneity, transferability, composability, and linearity. These properties of LOCO Edit benefit greatly from the low-dimensional semantic subspace. Our method can further be extended to unsupervised or text-supervised editing in various text-to-image diffusion models (T-LOCO Edit). Finally, extensive empirical experiments demonstrate the effectiveness and efficiency of LOCO Edit.

Low-rankness and Local Linearity in Diffusion Models

Low-rankness of the Jacobian and Local linearity of the PMP inspire us to design LOCO-Edit.

Unsupervised LOCO-Edit Method

Illustration of the unsupervised LOCO Edit for unconditional diffusion models.

LOCO-Edit Results

LOCO Edit across various datasets.

Visualizing semantically-meaningful edit directions identified via LOCO Edit.

Benchmarking LOCO-Edit

Comparisons with existing methods.

Compare local edit ability with other works on non-cherry-picked images.

Analysis

T-LOCO Edit on T2I Diffusion Models

Extension of LOCO-Edit to unsupervised or Text-supervised T-LOCO Edit on T2I diffusion models.

BibTeX


  @inproceedings{
    chen2024exploringlowdimensionalsubspacesdiffusion,
    title={Exploring Low-Dimensional Subspace in Diffusion Models for Controllable Image Editing},
    author={Siyi Chen and Huijie Zhang and Minzhe Guo and Yifu Lu and Peng Wang and Qing Qu},
    booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
    year={2024},
    url={https://arxiv.org/abs/2409.02374}
    }