Perturbed Attention Guidance (PAG)
Introduction
Perturbed Attention Guidance (PAG) is an advanced technique used in diffusion models to enhance image quality and structural coherence. Unlike other enhancement methods that require additional training or model modifications, PAG works by cleverly manipulating the sampling process itself, making it both efficient and relatively easy to implement.
What is Perturbed Attention Guidance?
Perturbed Attention Guidance improves sample quality by guiding the diffusion process using self-attention mechanisms. It works by creating intermediate samples with degraded structure (by replacing self-attention maps with identity matrices) and then steering the generation away from these degraded samples. This enhances structural coherence without requiring additional training or modules.
Mathematical Foundation
Mathematically, PAG applies a guidance signal proportional to the difference between normal denoising and perturbed denoising:
$$ \epsilon_\text{PAG}(\mathbf{x}_t, c, t) = \epsilon_\theta(\mathbf{x}_t, c, t) + s \cdot [\epsilon_\theta(\mathbf{x}_t, c, t) - \epsilon_\theta^\text{perturbed}(\mathbf{x}_t, c, t)] $$Where:
- $\epsilon_\theta(\mathbf{x}_t, c, t)$ is the standard denoising prediction
- $\epsilon_\theta^\text{perturbed}(\mathbf{x}_t, c, t)$ is the denoising prediction with perturbed attention maps
- $s$ is the PAG scale that controls guidance strength
The perturbation replaces key attention maps with identity matrices:
$$ \text{Att}^\text{perturbed}(Q, K, V) = V $$Instead of the normal attention calculation:
$$ \text{Att}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d}}\right)V $$How It Works
- Normal Denoising Process: The model computes a standard denoising prediction.
- Perturbed Denoising: The model computes a second prediction with perturbed attention maps.
- Guidance Application: The difference between these predictions is scaled and added to the normal prediction.
This approach essentially tells the model “don’t generate images that look like those with degraded attention,” pushing it to create images with stronger structural coherence.
Benefits and Applications
PAG offers several key advantages:
- Improved Structure: More coherent and anatomically correct subjects
- Enhanced Composition: Better spatial relationships between elements
- No Additional Training: Works with existing models without fine-tuning
- Adjustable Strength: The guidance scale can be tuned for different images and models
PAG is particularly useful for:
- Complex scenes with multiple subjects
- Images requiring precise anatomical details
- High-resolution generations where structural issues often emerge
- Situations where fine-tuning models is impractical
Implementation in ComfyUI
In ComfyUI, PAG is implemented through dedicated nodes that modify the denoising process. The primary node available is the PerturbedAttentionGuidance
node, which can be found in the “model_patches/unet” category.
Available PAG Nodes
PerturbedAttentionGuidance - A basic implementation that provides essential functionality with minimal complexity:
- Takes a model and applies PAG with a single scale parameter
- Scale parameter (default: 3.0, range: 0.0-100.0) controls the strength of guidance
- Works by patching the middle UNet block’s self-attention mechanism
- More resistant to breaking with ComfyUI updates due to its simplicity
PerturbedAttentionGuidance (Advanced) - Available from pamparamm’s repository, this version offers more options and greater flexibility:
- Provides additional configuration parameters beyond the basic scale
- Allows for more fine-grained control over the PAG implementation
- May require occasional updates to maintain compatibility with ComfyUI
Implementation Details
The PAG node works by modifying the attention mechanism in the UNet:
def perturbed_attention(q, k, v, extra_options, mask=None):
return v # Replace attention with identity function
def post_cfg_function(args):
# ... existing processing ...
# Replace Self-attention with PAG
model_options = comfy.model_patcher.set_model_options_patch_replace(
model_options, perturbed_attention, "attn1", unet_block, unet_block_id)
(pag,) = comfy.samplers.calc_cond_batch(model, [cond], x, sigma, model_options)
# Apply PAG formula: result + scale * (normal_prediction - perturbed_prediction)
return cfg_result + (cond_pred - pag) * scale
Using PAG in Your Workflow
To incorporate PAG into your ComfyUI workflow:
Add the
PerturbedAttentionGuidance
node to your workflowConnect your model to the node’s input
Adjust the scale parameter based on your needs:
- Lower values (0.5-2.0): Subtle structural improvements
- Medium values (2.0-5.0): Balanced enhancement
- Higher values (5.0+): Strong structural enforcement
Connect the output to subsequent nodes in your generation pipeline
The key parameter is the PAG scale, which controls the strength of the guidance. Higher values enforce stronger structural coherence but may limit creativity, while lower values allow more freedom but provide less guidance.
Conclusion
Perturbed Attention Guidance represents a powerful mathematical approach to improving diffusion model outputs without the computational cost of additional training. By understanding how it works, you can leverage this technique to generate images with improved structure and coherence across a variety of models and scenarios.