Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models

1SHI Labs @ Georgia Tech & UIUC, 2Tsinghua University, 3Picsart AI Research
*Indicates Equal Contribution

Smooth Diffusion, a new category of diffusion models that is simultaneously high-performing and smooth.

Smooth Diffusion for downstream image synthesis tasks.

Abstract

Recently, diffusion models have made remarkable progress in text-to-image (T2I) generation, synthesizing images with high fidelity and diverse contents. Despite this advancement, the latent space smoothness within diffusion models remains largely unexplored. Smooth latent spaces ensure that a perturbation on an input latent corresponds to a steady change in the output image. This property proves beneficial in downstream tasks, including image interpolation, inversion, and editing. In this work, we expose the non-smoothness of diffusion latent spaces by observing noticeable visual fluctuations resulting from minor latent variations. To tackle this issue, we propose Smooth Diffusion, a new category of diffusion models that can be simultaneously high-performing and smooth. Specifically, we introduce Step-wise Variation Regularization to enforce the proportion between the variations of an arbitrary input latent and that of the output image is a constant at any diffusion training step. In addition, we devise an interpolation standard deviation (ISTD) metric to effectively assess the latent space smoothness of a diffusion model. Extensive quantitative and qualitative experiments demonstrate that Smooth Diffusion stands out as a more desirable solution not only in T2I generation but also across various downstream tasks.

Methodology

Smooth Diffusion (c) enforces the ratio between the variation of the input latent and the variation of the output prediction is a constant. We propose Training-time Smooth Diffusion (d) to optimize a "single-step snapshot" of the variation constraint in (c). DM: Diffusion model.

Downstream Tasks

Our method formally introduces latent space smoothness to diffusion models like Stable Diffusion. This smoothness dramatically aids in: 1) improving the continuity of transitions in image interpolation, 2) reducing approximation errors in image inversion, and 3) better preserving unedited contents in image editing.


1. Image Interpolation

Using the Smooth LoRA trained atop Stable Diffusion V1.5.

Intergrating the above Smooth LoRA into other community models.

2. Image Inversion


3. Image Editing

BibTeX

@article{guo2023smooth,
  title={Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models},
  author={Jiayi Guo and Xingqian Xu and Yifan Pu and Zanlin Ni and Chaofei Wang and Manushree Vasu and Shiji Song and Gao Huang and Humphrey Shi},
  journal={arXiv preprint arXiv:2312.04410},
  year={2023}
}