Stable Diffusion works with various pretrained model files called checkpoints (.ckpt) or safetensors (.safetensors) files. These are large binary files that store the model’s learned weights.

Different checkpoints have different focuses:

SD 1.5: Early popular version, great general-purpose model.
SD 2.1: Newer, improved model with better quality and capabilities.
SDXL: Latest large-scale model with high resolution and better realism.
Specialised models:
- Waifu Diffusion: Focused on anime and manga styles.
- RealisticVision, AnythingV5, etc.: Fine-tuned for realistic portraits, art, or themed outputs.

Why use different models?

Each checkpoint or model is trained differently and thus excels at different styles or quality levels.
You can switch between models depending on what kind of images you want.
Some models are optimised for speed, others for detail or style.

🧠 Current Stable Diffusion Base Models

1. Stable Diffusion 1.x Series

Versions: 1.1, 1.2, 1.3, 1.4, and 1.5
Resolution: 512×512 pixels
Architecture: Utilises the ViT-L/14 CLIP model for text conditioning
Popularity: Widely adopted due to its compatibility with various fine-tuned models like Anything V3, DreamShaper, and Realistic Vision.

2. Stable Diffusion 2.x Series

Versions: 2.0 and 2.1
Resolution: 768×768 pixels
Architecture: Incorporates the ViT-H/14 CLIP model, enhancing prompt expressiveness