ControlNet is like an add-on neural network that works alongside Stable Diffusion to give you more precise control over the image generation process. Instead of relying on just a text prompt, ControlNet lets you feed extra image-based guidance (like sketches, edges, or poses) so the generated image better matches your vision.
Stable Diffusion generates images from text, but sometimes the output can be unpredictable or not exactly what you want in terms of composition or structure.
ControlNet solves this by providing a strong conditioning signal — that is, it tells the model where and how to generate certain shapes or layouts based on the extra input you provide (like a rough sketch or an edge map).
| Control Type | Description | Use case example |
|---|---|---|
| Canny | Edge detection from an image using the Canny algorithm | Generate an image that respects the outline of an existing drawing or photo |
| HED (Holistically-Nested Edge Detection) | A more sophisticated edge detection that captures fine details | Similar to Canny but captures edges more precisely |
| OpenPose | Extracts human body pose keypoints (stick figure style) | Generate images with a specific human pose or movement |
| Depth | Depth map that encodes distance information of objects | Generate images with correct 3D perspective and depth |
| Normal Map | Surface orientation map that helps with lighting and 3D shape | Useful for generating realistic 3D lighting effects |
| MLSD (Line Segment Detector) | Detects straight line segments | Useful for architectural or structured scenes |