LoRA is a technique developed to fine-tune large neural networks efficiently without needing to retrain the entire model. Instead of updating every parameter (which can be millions or billions), LoRA trains only a small, low-rank subset of parameters.
Why is this important?
- Large models like Stable Diffusion have hundreds of millions or billions of parameters.
- Fully fine-tuning (retraining) such models requires huge computational resources and a lot of time.
- LoRA allows you to adapt or specialize a pretrained model with much less GPU memory and time.
How does LoRA work (conceptually)?
Matrix Decomposition Idea:
Neural networks mostly operate with large weight matrices. LoRA assumes these large weight matrices can be approximated by the product of two smaller matrices with lower rank (much fewer parameters).
Instead of changing the big matrix, LoRA learns these smaller “delta” matrices.
- Freezing the Base Model: The original weights of the model remain frozen (unchanged). LoRA only learns these small “low-rank” matrices that add adjustments to the frozen weights.
- Efficient Updates: Since these smaller matrices are much smaller, the number of trainable parameters drops drastically.
Benefits of LoRA:
- Lower VRAM usage: Because you only update a small subset of parameters, it uses less GPU memory.
- Faster Training: Less data to update means training can be done quickly, even on consumer GPUs.
- Modularity: You can train multiple LoRA modules for different concepts/styles and load them together or mix them at inference time.
- Preserves Original Model: Since the main weights are unchanged, you can keep using your original model and add/remove LoRA modules flexibly.
- Easy Sharing: LoRA files are small (a few MBs) compared to full model checkpoints (GBs), making sharing easier.