Data Curation and Preprocessing
The model's expertise is derived from a custom-curated dataset, data/emergency_dataset.jsonl
. Each entry is a JSON object containing an instruction
(an emergency-related question) and a high-quality output
(a safe, step-by-step answer).
Before training, this data is formatted using the format_chat_template
function in train_pipeline.py
. This function applies the model's official chat template, structuring the data into the conversational format (<start_of_turn>user...<end_of_turn>...
) that the instruction-tuned base model was trained on. This alignment is critical for effective learning.