Hidden chain-of-thought

When a model thinks through a problem step by step internally, but you only see the final answer or a short summary. The thinking still happens and still costs tokens; you just cannot read the full working.

Picture hiring a consultant who solves your problem in a private office, then hands you a polished recommendation without showing their notes. The model generates reasoning tokens that guide its answer, but the provider strips or summarizes them before returning the response. Providers do this for several reasons: safety (preventing prompt injection through visible reasoning), competitive secrecy (reasoning traces reveal training techniques), and user experience (raw reasoning traces are often confusing to read).

Builder example

Hidden reasoning tokens still count toward your bill and add to response latency. If your API costs spike after switching to a reasoning model, the hidden thinking tokens are likely the cause. Monitor total token usage, including reasoning tokens, when budgeting for these models.

Common confusion: Hidden reasoning is real computation happening on real hardware. The model genuinely thinks through the problem; you simply cannot inspect the full trace.