Training / Standard term
Instruction tuning
Training a model on thousands of "question and good answer" pairs so it learns to follow instructions, answer questions, and complete tasks instead of just predicting the next word.
Instruction tuning is training a model on thousands of "question and good answer" pairs so it learns to follow instructions, answer questions, and complete tasks instead of just predicting the next word. The model trains on curated examples where each input is a request ("Summarize this article," "Write a Python function that sorts a list") paired with a high-quality response. After seeing enough of these pairs, the model learns the pattern of receiving a task and producing a helpful answer. This stage turns a raw pretrained model from a text-prediction engine into something that feels like an assistant.
Builder example
Every time you give a model a natural-language instruction and it responds helpfully, you are relying on instruction tuning. This stage determines how well a model follows directions, formats outputs the way you ask, and stays on task. Models with stronger instruction tuning are more reliable for structured workflows like "extract these five fields from this document."
Common confusion: Instruction tuning and reinforcement learning from human feedback (RLHF) are different stages. Instruction tuning teaches the model to follow directions using supervised examples. RLHF comes afterward and refines the model's preferences using human ratings of which responses are better.