Ep 10: ControlNet basics

9 min

what controlnet does text prompts tell the ai what to generate they don't give you much control over how it's arranged you can ask for "a person with their left hand raised," but where that hand ends up is mostly random controlnet fixes this you provide a reference image a preprocessor extracts specific structural information from it (a pose skeleton, depth map, edge map) that structure guides how the ai composes the output the result a completely new image, generated from your prompt, but arranged exactly like your reference how it works two stages 1\ preprocessing run your reference image through a preprocessor node it pulls out one type of structural data (pose, depth, edges, etc ) 2\ generation feed that preprocessed image into a controlnet model alongside your prompt the diffusion model follows both generating content from the prompt, arranging it according to the structure types of controlnet canny (edges) extracts every edge and outline produces a black and white line map the output closely matches the original's shapes and boundaries the most faithful type good for architecture, product design, or any time you need exact outlines preserved depth captures the 3d spatial layout bright areas = close to camera, dark areas = far away the ai keeps the same sense of space but has much more creative freedom with what fills that space good for landscapes and scene composition openpose (body skeleton) extracts a skeleton showing head, shoulders, elbows, wrists, hips, knees, fingers the generated image matches the exact same pose with a completely different person, clothes, and background variants body only (basic skeleton) face detection (expression matching) hand detection (finger positions) full (body + face + hands in one pass, try dw openpose) line art similar to canny but more organic lines have varying weight, some details are emphasized, others softened feels more like a human drawing best for converting sketches into rendered images settings that matter control weight determines how rigidly the ai follows the preprocessed structure start at 0 7 if you get artifacts, lower it if the output ignores your structure, raise it two rules to remember the preprocessor and controlnet model must match the same type (don't use a depth preprocessor with a pose controlnet) the controlnet model must match your diffusion model's family (a stable diffusion controlnet won't work with flux) stacking controlnets you can use more than one controlnet at the same time combine depth (for spatial layout) with openpose (for pose) and each adds its own layer of structural guidance lower the weight of each one when stacking (try 0 5 0 7 per controlnet) faq which controlnet type should i use for character poses? openpose it extracts the skeleton from your reference and forces the ai to match that body position use dw openpose full for the most detailed result including body, face, and hands can i use multiple controlnets together? yes stack them to combine different types of control lower the weight of each one to avoid artifacts my controlnet output has artifacts what do i do? lower the control weight start at 0 7 and go down from there also make sure the controlnet model matches your diffusion model's family