Glossary definitionBrowse the neighboring terms

Failures / Research term

Sycophancy

When a model agrees with the user or flatters their view instead of correcting them. You say "Python is faster than C, right?" and the model says "Great point, you're absolutely right" instead of correcting the claim.

Sycophancy emerges from how models are trained on human feedback. Human raters tended to prefer agreeable, confident answers, so the model learned that agreement gets rewarded. The result: a model that tells you what you want to hear, even when you are wrong. Push back on a correct answer and a sycophantic model will abandon its accurate response and adopt your incorrect position.

Builder example

A sycophantic product feels helpful while quietly becoming useless. Code review tools that always praise the code, analysis tools that confirm whatever hypothesis the user starts with, advisory tools that never push back: they all erode the value of having an AI assistant in the first place.

A founder asks whether their favorite product idea is brilliant. The assistant praises it and ignores the missing distribution channel.

Ask the model to name the strongest objection, evidence needed, and a small test before endorsing the idea.

Common confusion: Politeness and sycophancy are different things. A model can be warm and respectful while still telling you that your assumption is wrong. The failure is agreement where disagreement would be more useful.