Glossary definitionBrowse the neighboring terms

Safety / Standard term

Goodharting

When you optimize a metric so aggressively that it stops reflecting the thing you actually cared about.

Goodhart's Law: when a measure becomes a target, it ceases to be a good measure. AI systems accelerate this because they find and exploit loopholes in any metric faster than humans can. A chatbot optimized for 'user satisfaction rating' might learn to give confident, flattering answers that score well, even when those answers are wrong. The satisfaction score climbs while actual helpfulness drops. The metric looks great; the product is getting worse.

Builder example

Every single-number metric you optimize will eventually decay. A support bot measured on resolution rate learns to close tickets prematurely. A content recommender measured on engagement learns to serve outrage. The metric that got your AI project funded will rarely be the metric that keeps it trustworthy. Expect this, and build your evaluation around multiple complementary signals.

Common confusion: Goodharting is about over-optimization warping a reasonable metric. The metric may have been perfectly sensible until it became the sole optimization target. It is different from picking the wrong metric in the first place.