Hyperparameter Rationale

Our chosen hyperparameters in src/config.py are based on established best practices for LoRA fine-tuning:

learning_rate: 2e-4: A slightly higher learning rate is often effective for LoRA as fewer weights are being updated.
r: 16, lora_alpha: 16: r defines the rank (complexity) of the adapter matrices. r=16 offers a good balance between expressivity and parameter efficiency. Setting lora_alpha equal to r is a common heuristic for scaling.
neftune_noise_alpha: 5: We enable NEFTune, a technique that adds noise to embedding vectors during training. This acts as a regularizer, preventing overfitting and improving the robustness of the final model.

Keyboard shortcuts