See how does Stable Diffusion actually work for a detailed explanation on how the AI does it's thing.
General hints for text to image engines:
- The type of image (photo, painting, etc.).
- Then the people, animals, and objects.
- Followed by the landscape.
- And finally, the context.
Pick a good model, preferably one that is specialised in your general image type and if available you can enhance with a LoRA or similar focused on your target image type. I use
civitai.com to source these. EasyDiffusion now supports combining multiple LoRAs together.
For example the following images created on EasyDiffusion using the Realistic Vision 4.0 model, with the prompt "taylor swift in strapless green dress,fairy wings, a photograph of taylor swift as tinker bell the fairy sitting on a red mushroom" used two
LoRAs (Taylor Swift and Tinker Bell) to create the exact mix of image required.
See more at Easy Diffusion