Perturbed Attention Guidance (PAG)
On this page

Perturbed Attention Guidance (PAG)


Introduction


Perturbed Attention Guidance (PAG) is an advanced technique used in diffusion models to enhance image quality and structural coherence. Unlike other enhancement methods that require additional training or model modifications, PAG works by cleverly manipulating the sampling process itself, making it both efficient and relatively easy to implement.

What is Perturbed Attention Guidance?


Perturbed Attention Guidance improves sample quality by guiding the diffusion process using self-attention mechanisms. It works by creating intermediate samples with degraded structure (by replacing self-attention maps with identity matrices) and then steering the generation away from these degraded samples. This enhances structural coherence without requiring additional training or modules.

Mathematical Foundation


Mathematically, PAG applies a guidance signal proportional to the difference between normal denoising and perturbed denoising:

$$ \epsilon_\text{PAG}(\mathbf{x}_t, c, t) = \epsilon_\theta(\mathbf{x}_t, c, t) + s \cdot [\epsilon_\theta(\mathbf{x}_t, c, t) - \epsilon_\theta^\text{perturbed}(\mathbf{x}_t, c, t)] $$

Where:

  • $\epsilon_\theta(\mathbf{x}_t, c, t)$ is the standard denoising prediction
  • $\epsilon_\theta^\text{perturbed}(\mathbf{x}_t, c, t)$ is the denoising prediction with perturbed attention maps
  • $s$ is the PAG scale that controls guidance strength

The perturbation replaces key attention maps with identity matrices:

$$ \text{Att}^\text{perturbed}(Q, K, V) = V $$

Instead of the normal attention calculation:

$$ \text{Att}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d}}\right)V $$

How It Works


  1. Normal Denoising Process: The model computes a standard denoising prediction.
  2. Perturbed Denoising: The model computes a second prediction with perturbed attention maps.
  3. Guidance Application: The difference between these predictions is scaled and added to the normal prediction.

This approach essentially tells the model “don’t generate images that look like those with degraded attention,” pushing it to create images with stronger structural coherence.

Benefits and Applications


PAG offers several key advantages:

  1. Improved Structure: More coherent and anatomically correct subjects
  2. Enhanced Composition: Better spatial relationships between elements
  3. No Additional Training: Works with existing models without fine-tuning
  4. Adjustable Strength: The guidance scale can be tuned for different images and models

PAG is particularly useful for:

  • Complex scenes with multiple subjects
  • Images requiring precise anatomical details
  • High-resolution generations where structural issues often emerge
  • Situations where fine-tuning models is impractical

Implementation in ComfyUI


In ComfyUI, PAG is implemented through dedicated nodes that modify the denoising process. The primary node available is the PerturbedAttentionGuidance node, which can be found in the “model_patches/unet” category.

Available PAG Nodes

  1. PerturbedAttentionGuidance - A basic implementation that provides essential functionality with minimal complexity:

    • Takes a model and applies PAG with a single scale parameter
    • Scale parameter (default: 3.0, range: 0.0-100.0) controls the strength of guidance
    • Works by patching the middle UNet block’s self-attention mechanism
    • More resistant to breaking with ComfyUI updates due to its simplicity
  2. PerturbedAttentionGuidance (Advanced) - Available from pamparamm’s repository, this version offers more options and greater flexibility:

    • Provides additional configuration parameters beyond the basic scale
    • Allows for more fine-grained control over the PAG implementation
    • May require occasional updates to maintain compatibility with ComfyUI

Implementation Details

The PAG node works by modifying the attention mechanism in the UNet:

def perturbed_attention(q, k, v, extra_options, mask=None):
    return v  # Replace attention with identity function

def post_cfg_function(args):
    # ... existing processing ...
    
    # Replace Self-attention with PAG
    model_options = comfy.model_patcher.set_model_options_patch_replace(
        model_options, perturbed_attention, "attn1", unet_block, unet_block_id)
    (pag,) = comfy.samplers.calc_cond_batch(model, [cond], x, sigma, model_options)
    
    # Apply PAG formula: result + scale * (normal_prediction - perturbed_prediction)
    return cfg_result + (cond_pred - pag) * scale

Using PAG in Your Workflow

To incorporate PAG into your ComfyUI workflow:

  1. Add the PerturbedAttentionGuidance node to your workflow

  2. Connect your model to the node’s input

  3. Adjust the scale parameter based on your needs:

    • Lower values (0.5-2.0): Subtle structural improvements
    • Medium values (2.0-5.0): Balanced enhancement
    • Higher values (5.0+): Strong structural enforcement
  4. Connect the output to subsequent nodes in your generation pipeline

The key parameter is the PAG scale, which controls the strength of the guidance. Higher values enforce stronger structural coherence but may limit creativity, while lower values allow more freedom but provide less guidance.

Conclusion


Perturbed Attention Guidance represents a powerful mathematical approach to improving diffusion model outputs without the computational cost of additional training. By understanding how it works, you can leverage this technique to generate images with improved structure and coherence across a variety of models and scenarios.