LoRA is a technique developed to fine-tune large neural networks efficiently without needing to retrain the entire model. Instead of updating every parameter (which can be millions or billions), LoRA trains only a small, low-rank subset of parameters.


Why is this important?


How does LoRA work (conceptually)?

Matrix Decomposition Idea:

Neural networks mostly operate with large weight matrices. LoRA assumes these large weight matrices can be approximated by the product of two smaller matrices with lower rank (much fewer parameters).

Instead of changing the big matrix, LoRA learns these smaller “delta” matrices.

  1. Freezing the Base Model: The original weights of the model remain frozen (unchanged). LoRA only learns these small “low-rank” matrices that add adjustments to the frozen weights.
  2. Efficient Updates: Since these smaller matrices are much smaller, the number of trainable parameters drops drastically.

Benefits of LoRA: