Ep 4: Text-to-Image from scratch

11 min

what we're building a complete text to image workflow no template you'll place every node and understand why each one is there a text to image workflow does five things loads an ai model takes a text prompt creates a blank canvas for the ai to start from generates the image converts the result into something you can see and save place the ksampler start here the ksampler is where generation happens everything else feeds into it you'll see four inputs on the left model, positive, negative, and latent image load a diffusion model add a load diffusion model node connect its violet output to the ksampler's violet model input this gives the ksampler the ai model it needs to generate drag from an empty connector and drop it on the canvas comfyui shows you a filtered list of nodes that are compatible with that connector add your prompts add two clip text encode nodes connect one to the ksampler's positive input (what you want "sunflower on a meadow") connect one to the negative input (what to avoid "blurry, distorted") both need a clip model add a load clip node and connect it to both make sure the clip model matches your diffusion model create an empty starting canvas add an empty latent image node set width and height (1024x1024 for most modern models) connect it to the ksampler's latent image input this is the blank slate the ai starts refining decode and save the ksampler outputs latent data (pink) you need to convert it to a visible image add a vae decode node connect the ksampler's output to it load a vae model and connect that too then add a save image node and connect the vae decode output (blue) to it click run your image appears generated images embed the workflow data anyone can drag a generated image onto their comfyui canvas to load the exact workflow that created it faq what is the ksampler? the core generation node it takes noise and gradually refines it into an image that matches your prompt, using the model, positive/negative conditioning, and a starting canvas how do i know which clip and vae to use? check your diffusion model's documentation each model specifies compatible clip and vae files mismatched models produce poor results or errors can i share this workflow? yes all floyo workflows are live sharable links that can be sent to other creators directly to share with the same settings or inputs, add folks to your private floyo team and share with them you can also download the workflow json from the canvas can i control pose or style in a text to image workflow? not with text alone for structural control like pose, depth, or edges, you need controlnet that's covered in ep 10 controlnet basics docid 85 ftbnp59jo18bu8 lhr ep 3 nodes 101 docid\ bmn9cznal urcr43vlcwv ep 5 ksampler settings docid\ tqy7h2jjpl0xrdrlfe1bg