Ep 7: Models explained

10 min

why models matter without models, your workflow is just a set of empty nodes the models are what actually know how to generate images, translate text, encode data, and control composition here's what each type does diffusion models the main engine a diffusion model takes random noise and turns it into a coherent image matching your prompt every image generation workflow needs one these are large files older models were around 6gb modern ones are commonly 10 20gb, some over 40gb two file formats to know safetensors is the standard use this gguf is a compressed format that uses less memory at some quality cost avoid ckpt files this older format can contain hidden executable code stick with safetensors lora (low rank adaptation) a lora is a small model that modifies a diffusion model's behavior it doesn't generate on its own it steers example a diffusion model doesn't know what a specific person looks like unless they were in the training data you can create a lora trained on photos of that person pair it with the diffusion model, and it generates accurate images of them in different poses, lighting, and styles loras work for anything art styles, product designs, characters, clothing, architecture they're small, so you can collect and swap between many of them clip clip translates your text prompt into numbers the diffusion model can work with it sits between your words and the ai a good clip model understands nuance "a dog sitting calmly" is different from "a dog jumping excitedly " using the wrong clip model is like using a translator who speaks a different language than your model each diffusion model requires a specific clip model check the documentation vae (variational auto encoder) the vae converts data between two spaces encode visible image → latent space (so the ai can work on it) decode latent space → visible image (so you can see the result) if your colors look off or your output looks washed out, it might be the wrong vae controlnet controlnet gives you structural control over composition text prompts describe what to generate controlnet controls how it's arranged the pose, the depth, the edges more on this in episode 10 the model landscape the open source image generation space moves fast stable diffusion 1 5 kicked things off sdxl was a major step up now there's wan, qwan, flux, zimagine, and more new models come out constantly the point of this course isn't to memorize which model is best right now it's to understand how models work in comfyui so that when the next one arrives, you already know how to use it faq what is a lora and when would i use one? a lora is a small add on model that teaches a diffusion model something specific a face, an art style, a product design load it alongside your diffusion model using a load lora node why do my images look bad even with a good model? most likely a mismatch your clip, vae, and diffusion model all need to be compatible with each other check your diffusion model's documentation for the recommended clip and vae what file format should i use for models? safetensors it's the current standard and it's safe gguf works too for lower memory usage avoid ckpt