Glossary definitionBrowse the neighboring terms

Training / Standard term

Constitutional AI

A training method where a model improves its own outputs by following a written set of principles, reducing the need for human raters to judge every response.

Constitutional AI is a training method where a model improves its own outputs by following a written set of principles, reducing the need for human raters to judge every response. The model generates a response, reviews its own answer against a list of explicit behavioral rules (the "constitution"), rewrites it to comply, and uses the improved version as training data. A principle might say "Choose the response that is least likely to encourage illegal activity," and the model would learn to self-correct toward that standard. This replaces much of the expensive human rating step used in traditional reinforcement learning from human feedback (RLHF).

Builder example

The broader lesson applies to anyone building AI products: writing explicit behavioral principles makes your system's values inspectable and debuggable. When you can point to a document that says "prefer concise answers over verbose ones," you can trace why the model behaves a certain way and adjust the principle directly.

Common confusion: A constitution does not guarantee ethical behavior. The principles are only as good as the people who wrote them, and the training process can still produce unexpected gaps.