Post-training

Everything that happens after a model's initial pretraining to turn it from a raw text predictor into a usable assistant, including instruction tuning, preference optimization, safety training, and behavioral adjustments.

A pretrained model knows language and facts, but it has no sense of how to be helpful, safe, or well-behaved. Post-training is the collection of stages that shape it into something people can interact with productively. These stages typically include instruction tuning (teaching it to follow requests), preference learning like RLHF or DPO (teaching it which responses humans prefer), safety training (teaching it to refuse harmful requests), and personality adjustments. Two models built on the same pretrained base can feel completely different after post-training: one might be concise and direct, another verbose and cautious, depending on how this stage was done.

Builder example

Post-training is why model selection is more nuanced than comparing parameter counts or benchmark scores. The same base model can produce wildly different experiences depending on how it was post-trained. When you notice that one model refuses too often while another is too agreeable, or that one follows formatting instructions well while another ignores them, you are seeing different post-training choices at work.

Common confusion: Post-training is an umbrella term covering fine-tuning, RLHF, safety training, and more. Fine-tuning is one technique within post-training. The two terms are not synonyms.