Skip to content

Training a style Embedding For Stable Diffusion Using Textual Inversion

  • Blog

In this post, I will go through the steps and parameters involved in generating an embedding for Stable Diffusion

Prerequisites

I took some nice anime images from Twitter and cropped them to a resolution of 768×768. Total of 17 images. And only 3 of them contain faces.

My hardware/software setup:

RTX 3070 8Gb
32Gb RAM
Windows 11

I use AUTOMATIC1111 WebUI with the following command line arguments:

set COMMANDLINE_ARGS=--deepdanbooru --xformers --medvram

I had to use --medvram to lower VRAM consumption. Without it, I get Out of CUDA memory error for 768×768 training, but absolutely fine for 512×512.

SD-v2.1-ema model should be loaded.

Training

Important training parameters:

17 images 768x768
No CLIP at all. I pointed training to the directory with only images, no captions. 
No preprocessing, images were cropped before.
Number of vectors per token: 8
Embedding Learning rate: 0.0001
Batch size: 1
Gradient accumulation steps: 1
Max steps: 4000
Choose latent sampling method: deterministic

Training took about 1 hour

Results

Every block contains the original image without embedding in the first column and the rest are made with embedding at 1000, 2000, 3000 and 4000 steps.

photo, close-up portrait of beautiful woman, long red hair, flying hair, gold and diamond armour, fineart, extremely detailed, art by
Negative prompt: ugly, blurry, deformed, bad proportions, tattoo, freckles, drawing, painting
Steps: 30, Sampler: Euler a, CFG scale: 7, Seed: 3122800040, Size: 768x768, Model hash: 4bdfc29c, ENSD: -1
a photo of a beautiful landscape, mountains, forest, dramatic lighting, realistic, highly detailed, art by
Negative prompt: ugly, blurry
Steps: 30, Sampler: Euler a, CFG scale: 7, Seed: 4028064328, Size: 768x768, Model hash: 4bdfc29c, ENSD: -1
a photo of a cozy kitchen with large windows, a coffee table with cups and bottles, dramatic lighting, realistic, highly detailed, art by
Negative prompt: ugly, blurry, painting, drawing, text, watermark
Steps: 30, Sampler: Euler a, CFG scale: 7, Seed: 2279164861, Size: 1024x768, Model hash: 4bdfc29c, ENSD: -1
a photo of a cool car driving on road along the beach with pamls, dramatic lighting, realistic, highly detailed, art by
Negative prompt: ugly, blurry, painting, drawing, text, watermark
Steps: 30, Sampler: Euler a, CFG scale: 7, Seed: 2279164861, Size: 1024x768, Model hash: 4bdfc29c, ENSD: -1

Conclusion

Even though there are hundreds of anime models and embeddings, I generally like the results.

In most cases, the last generation embedding (4000 steps) is the best. Next time I’ll try more images and more steps.