Justification for Tooling Choices

Unsloth: To overcome the significant computational costs of fine-tuning, we leveraged Unsloth. Unsloth provides highly optimized kernels that enable up to 2x faster training and reduce memory usage by 60% without sacrificing performance. This is achieved through manual autograd functions, re-engineered RoPE embeddings, and other deep optimizations. Its seamless integration (FastModel) allowed us to implement advanced techniques like QLoRA with minimal boilerplate code, making the entire process more efficient and accessible.

GGUF and llama.cpp: Our end goal is a model that is not only accurate but also deployable in resource-constrained environments. We chose the GGUF (GPT-Generated Unified Format) for this purpose. GGUF is a file format designed by the llama.cpp community for packaging and running LLMs efficiently. It quantizes model weights (reducing precision from 16-bit to as low as 2-bit), drastically shrinking file size and enabling fast inference on CPUs or consumer-grade GPUs. This makes our emergency assistant potentially deployable on edge devices or personal computers, increasing its real-world impact.

Keyboard shortcuts