PatchModelAddDownscale (Kohya DeepShrink)
On this page

PatchModelAddDownscale (Kohya DeepShrink)


Introduction


When generating high-resolution images with diffusion models, various issues often emerge, such as inconsistent anatomy, poor composition, or unnatural structures. The PatchModelAddDownscale technique, also known as Kohya DeepShrink, addresses these problems through approach to the diffusion process that maintains quality while enabling higher resolutions.

What is PatchModelAddDownscale?


The PatchModelAddDownscale node (Kohya DeepShrink) enables higher-resolution generation with better consistency by applying strategic downscaling to the UNet during early denoising steps. It adds a downscaling operation to specific model blocks, allowing the model to establish composition at a lower resolution first before gradually transitioning to full resolution. This helps prevent consistency issues like incorrect anatomy or composition problems that often occur at higher resolutions.

Mathematical Foundation


The enhanced version (v2) implements a three-phase scaling process with a gradual transition:

  1. Full Downscale (when $\sigma_\text{start} \geq \sigma \geq \sigma_\text{end}$):

    $$h' = D\left(h, \frac{W}{f}, \frac{H}{f}\right)$$
  2. Gradual Transition (when $\sigma_\text{end} > \sigma \geq \sigma_\text{gradual}$):

    $$s(\sigma) = \frac{1}{f} + \left(1 - \frac{1}{f}\right) \cdot \frac{\sigma_\text{end} - \sigma}{\sigma_\text{end} - \sigma_\text{gradual}}$$$$h' = D(h, W \cdot s(\sigma), H \cdot s(\sigma))$$
  3. Original Size (when $\sigma < \sigma_\text{gradual}$):

    $$h' = h$$

Where:

  • $h$ is the latent representation at a given UNet block
  • $D$ is the downscaling function using the specified method
  • $W, H$ are the original width and height
  • $f$ is the downscale factor
  • $\sigma$ is the current noise level
  • $\sigma_\text{start}, \sigma_\text{end}, \sigma_\text{gradual}$ are noise levels corresponding to the percentage parameters

PatchModelAddDownscale v2 (Gradual Scaling / mk2)


import torch
import comfy.utils

class PatchModelAddDownscale_v2:
    """A UNet model patch that implements dynamic latent downscaling with gradual transition.
    
    This node is an enhanced version of the original PatchModelAddDownscale that adds smooth
    transition capabilities. It operates in three phases:
    
    1. Full Downscale (start_percent → end_percent):
       Latents are downscaled by the specified downscale_factor
       
    2. Gradual Transition (end_percent → gradual_percent):
       Latents smoothly transition from downscaled size back to original size
       
    3. Original Size (after gradual_percent):
       Latents remain at their original size
       
    The gradual transition helps prevent abrupt changes in the generation process,
    potentially leading to more consistent results.
    
    Parameters:
        model: The model to patch
        block_number: Which UNet block to apply the patch to
        downscale_factor: How much to shrink the latents by
        start_percent: When to start downscaling (in terms of sampling progress)
        end_percent: When to begin transitioning back to original size
        gradual_percent: When to complete the transition to original size
        downscale_after_skip: Whether to apply downscaling after skip connections
        downscale_method: Algorithm to use for downscaling
        upscale_method: Algorithm to use for upscaling
    
    Code by:
    - Original: https://github.com/Jordach + comfyanon + kohya-ss
    """
    
    upscale_methods = ["bicubic", "nearest-exact", "bilinear", "area", "bislerp"]
    
    @classmethod
    def INPUT_TYPES(s):
        return {"required": { 
            "model": ("MODEL",),
            "block_number": ("INT", {"default": 3, "min": 1, "max": 32, "step": 1}),
            "downscale_factor": ("FLOAT", {"default": 2.0, "min": 0.1, "max": 9.0, "step": 0.001}),
            "start_percent": ("FLOAT", {"default": 0.0, "min": 0.0, "max": 1.0, "step": 0.001}),
            "end_percent": ("FLOAT", {"default": 0.35, "min": 0.0, "max": 1.0, "step": 0.001}),
            "gradual_percent": ("FLOAT", {"default": 0.6, "min": 0.0, "max": 1.0, "step": 0.001}),
            "downscale_after_skip": ("BOOLEAN", {"default": True}),
            "downscale_method": (s.upscale_methods,),
            "upscale_method": (s.upscale_methods,),
        }}
    
    RETURN_TYPES = ("MODEL",)
    FUNCTION = "patch"
    CATEGORY = "model_patches/unet"

    def calculate_upscale_factor(self, current_percent, end_percent, gradual_percent, downscale_factor):
        """Calculate the upscale factor during the gradual resize phase"""
        if current_percent <= end_percent:
            return 1.0 / downscale_factor  # Still fully downscaled
        elif current_percent >= gradual_percent:
            return 1.0  # Fully back to original size
        else:
            # Linear interpolation between downscaled and original size
            progress = (current_percent - end_percent) / (gradual_percent - end_percent)
            scale_diff = 1.0 - (1.0 / downscale_factor)
            return (1.0 / downscale_factor) + (scale_diff * progress)

    def patch(self, model, block_number, downscale_factor, start_percent, end_percent,
             gradual_percent, downscale_after_skip, downscale_method, upscale_method):
        model_sampling = model.get_model_object("model_sampling")
        sigma_start = model_sampling.percent_to_sigma(start_percent)
        sigma_end = model_sampling.percent_to_sigma(end_percent)
        sigma_rescale = model_sampling.percent_to_sigma(gradual_percent)
        
        def input_block_patch(h, transformer_options):
            if downscale_factor == 1:
                return h

            if transformer_options["block"][1] == block_number:
                sigma = transformer_options["sigmas"][0].item()
                
                # Normal downscale behavior between start_percent and end_percent
                if sigma <= sigma_start and sigma >= sigma_end:
                    h = comfy.utils.common_upscale(
                        h, 
                        round(h.shape[-1] * (1.0 / downscale_factor)), 
                        round(h.shape[-2] * (1.0 / downscale_factor)), 
                        downscale_method,
                        "disabled"
                    )
                # Gradually upscale latent after end_percent until gradual_percent
                elif sigma < sigma_end and sigma >= sigma_rescale:
                    scale_factor = self.calculate_upscale_factor(
                        sigma, sigma_rescale, sigma_end, downscale_factor
                    )
                    h = comfy.utils.common_upscale(
                        h,
                        round(h.shape[-1] * scale_factor),
                        round(h.shape[-2] * scale_factor),
                        upscale_method,
                        "disabled"
                    )
            return h

        def output_block_patch(h, hsp, transformer_options):
            if h.shape[2] != hsp.shape[2]:
                h = comfy.utils.common_upscale(
                    h, hsp.shape[-1], hsp.shape[-2], 
                    upscale_method, "disabled"
                )
            return h, hsp

        m = model.clone()
        if downscale_after_skip:
            m.set_model_input_block_patch_after_skip(input_block_patch)
        else:
            m.set_model_input_block_patch(input_block_patch)
        m.set_model_output_block_patch(output_block_patch)
        return (m, )

NODE_CLASS_MAPPINGS = {
    "PatchModelAddDownscale_v2": PatchModelAddDownscale_v2,
}

NODE_DISPLAY_NAME_MAPPINGS = {
    # Sampling
    "PatchModelAddDownscale_v2": "PatchModelAddDownscale v2",
}

The mk2 (v2) implementation of PatchModelAddDownscale follows the three-phase scaling process described in the Mathematical Foundation section above. This approach improves upon the original by adding a smooth, gradual transition from downscaled to full resolution, preventing abrupt changes in the latent space that can cause artifacts or inconsistencies in the generated image.

The process is divided into three phases, based on the current noise level ($\sigma$) during denoising:

  1. Full Downscale Phase: The latent is downscaled by the specified factor.
  2. Gradual Transition Phase: The latent is smoothly upscaled from the downscaled size back to the original size.
  3. Original Size Phase: The latent remains at its original size.

For the detailed equations and variable definitions, see the Mathematical Foundation section above.

Motivation and Benefits

  • Smooth Transition: By gradually increasing the resolution, mk2 avoids the sudden jump in detail that can occur in the original implementation. This leads to more stable and consistent image generation, especially at high resolutions.
  • Reduced Artifacts: The gradual phase helps the model adapt from composing the image at low resolution to refining details at high resolution, reducing the risk of artifacts or structural inconsistencies.
  • User Control: The additional gradual_percent parameter allows users to control how quickly the transition occurs, offering more flexibility for different generation scenarios.

Practical Implications

  • When to Use mk2:
    • Recommended for high-resolution generations where consistency and detail are important.
    • Especially useful when the original DeepShrink causes visible artifacts or abrupt changes.
  • Parameter Tuning:
    • downscale_factor: Controls how much the latent is shrunk.
    • start_percent, end_percent: Define the range for full downscaling.
    • gradual_percent: Sets when the transition to full resolution completes.

Code Logic Reference

In the code, the mk2 logic is implemented by checking the current noise level and applying the appropriate scaling:

  • Full Downscale:

    if sigma <= sigma_start and sigma >= sigma_end:
        h = downscale(h, factor)
    
  • Gradual Transition:

    elif sigma < sigma_end and sigma >= sigma_rescale:
        scale = interpolate(sigma, ...)
        h = upscale(h, scale)
    
  • Full Resolution:

    # h remains unchanged
    

This approach ensures a smooth, mathematically controlled transition from low to high resolution, improving the quality and coherence of high-resolution outputs.


Key Parameters


The effectiveness of DeepShrink depends on several key parameters:

  • downscale_factor ($f$): Typically 2.0, controls the amount of downscaling
  • start_percent and end_percent: Define when in the sampling process scaling occurs
  • gradual_percent: Controls when the transition to full resolution completes
  • downscale_method: Method used for downscaling (typically “bicubic”)

How It Works


DeepShrink operates on a simple but principle: diffusion models often perform better at establishing overall composition at lower resolutions, then refining details at higher resolutions. The technique implements this by:

  1. Early Steps (High Noise Levels): Downscaling the latent representation during the early denoising steps, which forces the model to focus on overall composition and structure.

  2. Transition Phase: Gradually increasing the resolution as the denoising progresses, which allows the model to smoothly adapt from broad strokes to finer details.

  3. Final Steps (Low Noise Levels): Processing at full resolution for the final steps, where the detailed elements are refined.

This approach bridges the gap between the model’s training resolution and the target generation resolution, resulting in more coherent outputs.

Benefits and Applications


PatchModelAddDownscale offers several key advantages:

  1. Better Composition: More coherent overall structure and composition
  2. Improved Anatomy: Fewer anatomical inconsistencies
  3. Higher Resolution Generation: Enables generation at resolutions beyond what the model was trained on
  4. Smoother Transitions: Gradual scaling prevents jarring shifts in the generation process

It’s particularly useful for:

  • Portrait generation where anatomical correctness is crucial
  • Complex scenes with multiple interacting elements
  • Any case where you want to generate at resolutions significantly higher than the model’s training resolution

Implementation in ComfyUI


In ComfyUI, the PatchModelAddDownscale node typically connects to a loaded model, modifying its internal behavior during sampling. The parameters can be tuned depending on the specific model and generation scenario, with higher downscale factors generally providing more stability at very high resolutions.

How to Add PatchModelAddDownscale mk2 to ComfyUI


To use the mk2 (gradual scaling) version of PatchModelAddDownscale in ComfyUI, add it as a custom node by following these steps:

  1. Locate your ComfyUI installation directory.
    This is the folder where you run python main.py to start ComfyUI.

  2. Navigate to the custom_nodes folder inside your ComfyUI directory.
    If it doesn’t exist, create it.

  3. Create a new file named deep_shrink_mk2.py in the custom_nodes folder.

  4. Copy the mk2 node code from the “PatchModelAddDownscale v2 (Gradual Scaling / mk2)” section above
    Paste it into your new deep_shrink_mk2.py file and save.

  5. Restart ComfyUI.
    After saving the file, close and restart ComfyUI to load the new custom node.

  6. Find and use the node in ComfyUI.
    In the ComfyUI interface, search for “PatchModelAddDownscale v2” in the node list. You can now use it in your workflows.


Tip:
If you want to update or modify the node, just edit the deep_shrink_mk2.py file and restart ComfyUI.


Conclusion


PatchModelAddDownscale represents an elegant mathematical solution to the challenges of high-resolution generation with diffusion models. By strategically applying downscaling during the early phases of the process, it enables models to produce more coherent and anatomically correct images at resolutions that would otherwise lead to inconsistent results.