ControlNet is like an add-on neural network that works alongside Stable Diffusion to give you more precise control over the image generation process. Instead of relying on just a text prompt, ControlNet lets you feed extra image-based guidance (like sketches, edges, or poses) so the generated image better matches your vision.


Why was ControlNet created?

Stable Diffusion generates images from text, but sometimes the output can be unpredictable or not exactly what you want in terms of composition or structure.

ControlNet solves this by providing a strong conditioning signal — that is, it tells the model where and how to generate certain shapes or layouts based on the extra input you provide (like a rough sketch or an edge map).


How does ControlNet work?

  1. Base model frozen: The original Stable Diffusion model is kept as is (weights frozen).
  2. Add ControlNet module: A separate neural network (ControlNet) is trained to process the additional input (e.g., edge map, pose).
  3. Conditioning: ControlNet outputs features that guide the diffusion process inside the main model.
  4. Generation: During image generation, both the text prompt and ControlNet’s conditioning influence the final image.

Common Control Types (Input Modalities):

Control Type Description Use case example
Canny Edge detection from an image using the Canny algorithm Generate an image that respects the outline of an existing drawing or photo
HED (Holistically-Nested Edge Detection) A more sophisticated edge detection that captures fine details Similar to Canny but captures edges more precisely
OpenPose Extracts human body pose keypoints (stick figure style) Generate images with a specific human pose or movement
Depth Depth map that encodes distance information of objects Generate images with correct 3D perspective and depth
Normal Map Surface orientation map that helps with lighting and 3D shape Useful for generating realistic 3D lighting effects
MLSD (Line Segment Detector) Detects straight line segments Useful for architectural or structured scenes

Why use ControlNet?