Style Blending with Artist Regularization
Introduction
Traditional regularization images serve a clear purpose: preventing overfitting by providing negative examples of what your model shouldn’t learn. But what if you actually want your model to learn something from those “regularization” images—just not as much as from your primary dataset?
This guide introduces a technique for training LoRAs that deliberately incorporate style elements from multiple artists while maintaining your target artist’s dominant characteristics. It’s for those special occasions when you want to say “I want it to be mostly like artist A, but with a hint of artists B, C, and D’s approach to [specific element].”
The Problem with Standard Regularization
When training character or style LoRAs, we typically use one of two approaches:
- Traditional regularization: Uses unrelated images to prevent overfitting, but contributes nothing stylistically
- No regularization: Risks overfitting but preserves pure style
But what if we want something in between? What if similar artists’ work could make our model better rather than just less overfitted?
The Solution: Controlled Style Contribution
Instead of treating similar artists’ work as regularization images (which get special treatment in the training process), we can treat them as supplementary training data with carefully controlled influence.
Here’s how to implement this technique:
1. Configure Your Dataset Structure
Create a config.toml that separates your primary artist and supplementary artists into distinct dataset groups:
[general]
flip_aug = true
color_aug = false
keep_tokens_separator = "|||"
shuffle_caption = true
caption_extension = ".txt"
caption_prefix = "masterpiece, best quality, newest, absurdres, highres"
caption_dropout_rate = 0.05
# Primary artist dataset (full strength)
[[datasets]]
enable_bucket = true
resolution = [1024, 1024]
[[datasets.subsets]]
image_dir = "/path/to/primary_artist/images"
num_repeats = 2 # Give primary artist more weight
caption_prefix = "primary_artist |||"
# Supplementary artists (reduced influence)
[[datasets]]
enable_bucket = true
resolution = [1024, 1024]
network_multiplier = 0.3 # ⚠️ KEY SETTING: Reduces gradient contribution
caption_tag_dropout_rate = 0.7 # Higher dropout for style generalization
caption_dropout_rate = 0.15 # Occasionally train on pure images
[[datasets.subsets]]
image_dir = "/path/to/similar_artist1/images"
num_repeats = 1
caption_prefix = "similar_artist1 |||"
class_tokens = ""
keep_tokens = 1
[[datasets.subsets]]
image_dir = "/path/to/similar_artist2/images"
num_repeats = 1
caption_prefix = "similar_artist2 |||"
class_tokens = ""
keep_tokens = 1
2. Understanding the Key Parameters
- network_multiplier: The secret sauce—scales the gradient contribution of these images during training (0.3 = 30% influence compared to primary dataset)
- num_repeats: Controls how often images from each dataset appear during training
- caption_tag_dropout_rate: Higher values (0.6-0.7) prevent learning specific caption patterns from supplementary artists
- caption_dropout_rate: Occasionally trains on images without captions (pure visual learning)
- keep_tokens & caption_prefix: Ensures artist tags remain as identifiers even with dropout
Why Set class_tokens = ""
for Supplementary Artists?
The class_tokens
parameter acts as a fallback when no caption file is found for an image. Looking at the sd-scripts code in train_util.py
, we can see exactly how it works:
cap_for_img = read_caption(img_path, subset.caption_extension, subset.enable_wildcard)
if cap_for_img is None and subset.class_tokens is None:
logger.warning(
f"neither caption file nor class tokens are found. use empty caption for {img_path}"
)
captions.append("")
missing_captions.append(img_path)
else:
if cap_for_img is None:
captions.append(subset.class_tokens) # Uses empty string when no caption found
missing_captions.append(img_path)
else:
captions.append(cap_for_img)
Setting class_tokens = ""
ensures that if caption files are missing, the images won’t get random default text applied during training. This gives you explicit control over what captions are used, preventing any unintended text influence.
How network_multiplier
Actually Works
The network_multiplier
parameter is perhaps the most important for this technique. Looking at the sd-scripts implementation, each dataset group can have its own multiplier value. In train_util.py
, we can see how this value is stored:
# in BaseDataset.__getitem__
example["network_multipliers"] = torch.FloatTensor([self.network_multiplier] * len(captions))
This multiplier is then passed through the training pipeline and applied to the LoRA network during training. In the LoRA implementation (networks/lora.py
), the multiplier controls how much influence each batch of images has on the LoRA weights:
def set_multiplier(self, multiplier):
self.multiplier = multiplier
for lora in self.text_encoder_loras + self.unet_loras:
lora.multiplier = self.multiplier
Setting network_multiplier = 0.3
for a dataset means the gradients from those images will have 30% of the impact that they would normally have, essentially reducing their influence on what the network learns without completely removing them as learning material.
Training Script Example
Here’s a sample training script that implements this approach with NoobAI:
#!/usr/bin/env zsh
set -e -o pipefail
NAME="style-blend-lora-v1s1600"
TRAINING_DIR="/path/to/dataset_directory"
STEPS=1600
SD_SCRIPT="${SD_SCRIPT:-sdxl_train_network.py}"
SD_REPO="${SD_REPO:-$HOME/toolkit/diffusion/sd-scripts}"
args=(
# Basic settings
--debiased_estimation_loss
--max_token_length=225
--keep_tokens=1
--keep_tokens_separator="|||"
# Base model
--pretrained_model_name_or_path=/path/to/noobaiXLVpredv10.safetensors
--v_parameterization
--zero_terminal_snr
# Dataset
--dataset_config="$TRAINING_DIR/config.toml" # Points to our custom config
--resolution="1024,1024"
--enable_bucket
--bucket_reso_steps=64
--min_bucket_reso=256
--max_bucket_reso=2048
--flip_aug
--shuffle_caption
--cache_latents
--cache_latents_to_disk
# Network config
--network_dim=100000
--network_alpha=64
--network_module="lycoris.kohya"
--network_args
"preset=full"
"conv_dim=100000"
"decompose_both=False"
"conv_alpha=64"
"rank_dropout=0"
"module_dropout=0"
"use_tucker=True"
"use_scalar=False"
"rank_dropout_scale=False"
"algo=lokr"
"bypass_mode=False"
"factor=16"
"dora_wd=True"
"train_norm=False"
# Optimizer & LR
--optimizer_type=SAVEUS # Adjust based on your preferences
--train_batch_size=14
--max_grad_norm=1
--gradient_checkpointing
--lr_warmup_steps=100
--learning_rate=0.0004
--unet_lr=0.0004
--text_encoder_lr=0.0001
--lr_scheduler="cosine"
--lr_scheduler_args="num_cycles=0.375"
# Technical settings
--no_half_vae
--sdpa
--mixed_precision="bf16"
# Saving & sampling
--max_train_steps=$STEPS
--save_model_as="safetensors"
--save_precision="fp16"
--save_every_n_steps=100
--sample_every_n_steps=100
--sample_sampler="euler"
--sample_at_first
--caption_extension=".txt"
--sample_prompts="$TRAINING_DIR/sample-prompts.txt"
)
# Run the training
source "$HOME/toolkit/zsh/train_functions.zsh"
setup_training_vars "$NAME"
args+=(
--output_dir="$OUTPUT_DIR/$NAME"
--output_name="$NAME"
--log_prefix="$NAME-"
--logging_dir="$OUTPUT_DIR/logs"
"$@"
)
LYCORIS_REPO=$(get_lycoris_repo)
trap cleanup_empty_output EXIT TERM
store_commits_hashes "$SD_REPO" "$LYCORIS_REPO"
run_training_script "$SD_REPO/$SD_SCRIPT" "${args[@]}"
Understanding Caption Dropout
Caption dropout is an underappreciated but powerful feature in sd-scripts. The code shows exactly how it works:
# From process_caption in train_util.py
is_drop_out = subset.caption_dropout_rate > 0 and random.random() < subset.caption_dropout_rate
is_drop_out = (
is_drop_out
or (subset.caption_dropout_every_n_epochs > 0
and self.current_epoch % subset.caption_dropout_every_n_epochs == 0)
)
# If dropout condition is met, clear the caption
if is_drop_out:
caption = ""
For supplementary artists, using a higher caption dropout rate (0.15-0.2) gives you more purely visual learning without being influenced by their specific textual descriptions.
Results and Benefits
Using this approach instead of traditional regularization produces LoRAs that:
- Maintain primary style dominance: Your target artist remains the strongest influence
- Incorporate complementary techniques: Learn specific elements from similar artists
- Achieve better generalization: More varied training data improves adaptability
- Allow fine-grained control: Adjust influence levels per dataset for perfect balance
It’s particularly effective for character LoRAs where you want the general style of one artist but prefer other artists’ approaches to specific elements like faces, anatomy, or clothing folds.
Experimentation Tips
- Start conservative: Begin with network_multiplier around 0.3 and adjust up/down based on results
- Watch the samples: Pay attention to when supplementary styles start becoming too prominent
- Adjust caption dropout: If text influences bleed between artists, increase caption_tag_dropout_rate
- Try different num_repeats ratios: For extreme cases, use 3:1 or 4:1 primary:supplementary ratios
Remember: Training is cheap, clicking the generate button 300 times to find the right prompt is expensive. Let your LoRA do the heavy lifting by incorporating the style elements you want from the beginning!