SNR-Edit: Structure-Aware Noise Rectification for Inversion-Free Flow-Based Editing

Lifan Jiang Boxi Wu Yuhang Pei Tianrun Wu
Yongyuan Chen Yan Zhao Shiyu Yu Deng Cai

State Key lab of CAD&CG, Zhejiang University

Submit to ICML 2026

Paper | GitHub | Dataset

SNR-Edit is a training-free, model-agnostic framework for inversion-free text-guided image editing with flow-based generative models. SNR-Edit identifies a Structural--Stochastic Mismatch, where a fixed Gaussian proxy drives the source trajectory off the source latent manifold and causes structural drift. To address this, SNR-Edit constructs a structure-aware prior by decomposing the source image into semantic regions (SAM2), encoding geometry with RoPE, and mapping region signatures through a frozen randomized projection to form a structural map. The map is resized to latent resolution, min--max normalized to [-1, 1], and broadcast to obtain a latent prior that modulates the noise. During integration, SNR-Edit mixes this prior with Gaussian noise to form rectified noise and computes a corrected source state that anchors the dynamics. It then evaluates the target velocity at a re-anchored position, yielding a rectified flow that preserves layout while executing edits. SNR-Edit improves structural fidelity and text alignment on SD3 and FLUX with minimal overhead.

Abstract

Inversion-free image editing using flow-based generative models challenges the prevailing inversion-based pipelines. However, existing approaches rely on fixed Gaussian noise to construct the source trajectory, leading to biased trajectory dynamics and causing structural degradation or quality loss. To address this, we introduce SNR-Edit, a training-free framework achieving faithful Latent Trajectory Correction via adaptive noise control. Mechanistically, SNR-Edit uses structure-aware noise rectification to inject segmentation constraints into the initial noise, anchoring the stochastic component of the source trajectory to the real image’s implicit inversion position and reducing trajectory drift during source–target transport. This lightweight modification yields smoother latent trajectories and ensures high-fidelity structural preservation without requiring model tuning or inversion. Across SD3 and FLUX, evaluations on PIE-Bench and SNR-Bench show that SNR-Edit delivers performance on pixel-level metrics and VLM-based scoring, while adding only ~1s overhead per image.

Paper

arXiv 2601.19180, 2024.

Citation

Lifan Jiang, Boxi Wu, Yuhang Pei, Tianrun Wu, Yongyuan Chen, Yan Zhao, Shiyu Yu, and Deng Cai. "SNR-Edit: Structure-Aware Noise Rectification for Inversion-Free Flow-Based Editing". In ICML 2026.
Bibtex

Schematic comparison of latent editing dynamics

(a) Inversion-based methods rely on a bidirectional mapping to recover latent noise, yet face an inherent fidelity-editability trade-off and high sensitivity to perturbations, making it difficult to maintain structural consistency.
(b) FlowEdit circumvents inversion but is hindered by a fundamental Structural--Stochastic Mismatch. By initiating the flow from content-agnostic Gaussian noise \(\xi \sim \mathcal{N}(0, I)\), the source proxy is evaluated off the source latent manifold, causing the differential editing trajectory to drift and leading to structural distortion.
(c) Ours. We introduce structure-aware noise rectification by modulating Gaussian noise with a structural prior \(\Phi_{\mathcal{Z}}\) (prepared by resizing, min--max normalization to \([-1,1]\), and channel-wise broadcasting), forming a rectified noise \(\tilde{\epsilon} = \lambda_{\text{struct}} \Phi_{\mathcal{Z}} + \lambda_{\text{stoch}} \xi\). This yields a corrected latent source state \(\tilde{Z}^{\text{src}}_t = (1 - t) Z_{\text{src}} + t \tilde{\epsilon}\), which anchors flow integration and enables a rectified velocity evaluation that preserves structural fidelity while reflecting target semantics.

SNR-Edit Pipeline

Phase 1 (Top) constructs a structural prior by extracting semantic masks \(\mathcal{M}\), computing geometric signatures \(\mathbf{s}_k\), and forming a single-channel structural map \(\Phi_{\text{map}}\) via a fixed randomized projection \(\psi\). This map is then resized to the latent resolution, normalized to \([-1,1]\), and broadcast across latent channels to obtain \(\Phi_{\mathcal{Z}}\).
Phase 2 (Bottom) executes rectified flow integration in latent space. Starting from \(Z_{t_{\max}}^{\text{FE}} = Z_{\text{src}} = \mathcal{E}(X_{\text{src}})\), the dynamics iteratively update \(Z^{\text{FE}}\) using a rectified velocity field driven by \(\tilde{\epsilon} = \lambda_{\text{struct}} \Phi_{\mathcal{Z}} + \lambda_{\text{stoch}} \xi\), which mixes \(\Phi_{\mathcal{Z}}\) and Gaussian noise \(\xi\). This mechanism encourages the output \(X_{\text{tar}} = \mathcal{D}(Z^{\text{FE}}_0)\) to preserve the source layout while realizing the target semantics.

Visualization of image editing results on PIE-Bench

Visualization of image editing results on PIE-Bench. Our method demonstrates superior performance across both FLUX and SD backbones, producing images that better preserve structural details, maintain accurate text-image correspondence, and achieve higher overall visual quality compared to existing approaches.

Visualization of image editing results on SNR-Bench

Visualization of image editing results on SNR-Bench. In the above examples, our method demonstrates superior performance on both FLUX and SD architectures, surpassing existing baselines in terms of maintaining structural integrity, aligning with text prompts, and generating high-quality visuals across a diverse set of challenging inputs.

Related Work

Ravi, Nikhila, et al. "SAM 2: Segment Anything in Images and Videos", arXiv:2408.00714, 2024.
Kulikov, Vladimir, et al. "FlowEdit: Inversion-Free Text-Based Editing using Pre-trained Flow Models", ICCV 2025.
Xie, Chenxi, et al. "DNAEdit: Direct Noise Alignment for Text-Guided Rectified Flow Editing", arXiv:2506.01430, 2025.