What is ControlNet?

ControlNet is like an add-on neural network that works alongside Stable Diffusion to give you more precise control over the image generation process. Instead of relying on just a text prompt, ControlNet lets you feed extra image-based guidance (like sketches, edges, or poses) so the generated image better matches your vision.

Why was ControlNet created?

Stable Diffusion generates images from text, but sometimes the output can be unpredictable or not exactly what you want in terms of composition or structure.

ControlNet solves this by providing a strong conditioning signal — that is, it tells the model where and how to generate certain shapes or layouts based on the extra input you provide (like a rough sketch or an edge map).

How does ControlNet work?

Base model frozen: The original Stable Diffusion model is kept as is (weights frozen).
Add ControlNet module: A separate neural network (ControlNet) is trained to process the additional input (e.g., edge map, pose).
Conditioning: ControlNet outputs features that guide the diffusion process inside the main model.
Generation: During image generation, both the text prompt and ControlNet’s conditioning influence the final image.

Common Control Types (Input Modalities):

Control Type	Description	Use case example
Canny	Edge detection from an image using the Canny algorithm	Generate an image that respects the outline of an existing drawing or photo
HED (Holistically-Nested Edge Detection)	A more sophisticated edge detection that captures fine details	Similar to Canny but captures edges more precisely
OpenPose	Extracts human body pose keypoints (stick figure style)	Generate images with a specific human pose or movement
Depth	Depth map that encodes distance information of objects	Generate images with correct 3D perspective and depth
Normal Map	Surface orientation map that helps with lighting and 3D shape	Useful for generating realistic 3D lighting effects
MLSD (Line Segment Detector)	Detects straight line segments	Useful for architectural or structured scenes

Why use ControlNet?

Precise composition control: Want your character standing in a certain pose? Use OpenPose.
Follow a sketch: Roughly draw outlines, and ControlNet will generate a detailed image matching it.
Maintain scene structure: Use edges or depth maps to keep buildings or objects in correct perspective.
Creative freedom: Mix text prompts with image hints for unique results.