Representation engineering

An umbrella term for techniques that read or modify a model's internal representations (its "thoughts in progress") during processing, rather than changing the prompt or retraining.

Representation engineering groups together methods that operate on what is happening inside the model while it runs. Probes read the model's internal state to detect specific information. Activation steering modifies that state to nudge behavior in a desired direction. Other techniques map out the geometry of the model's internal concept space. The unifying idea: the model's internal representations are a controllable surface you can monitor, measure, and in some cases adjust in targeted ways.

Builder example

Representation engineering opens a fundamentally new control surface for AI products. Today, product teams control model behavior through prompts (input-side) and filters (output-side). These techniques introduce mid-process interventions: detecting and correcting problematic reasoning while it forms, before it reaches the output. API providers may eventually expose them as configurable safety and behavior controls.

Common confusion: Prompt engineering and representation engineering sound similar, but they operate at completely different levels. Prompt engineering changes what the model reads as input. Representation engineering changes how the model processes that input internally. They complement each other and apply to different classes of problems.