Context window

The total amount of text a model can take in and produce in a single call, measured in tokens (roughly three-quarters of a word each).

Everything the model reads and writes has to fit inside this window: your system instructions, the conversation so far, any retrieved documents, tool definitions, tool outputs, and the model's own response. Think of it like a desk. A bigger desk lets you spread out more papers, but covering every inch makes it harder to find the one you need. Modern models advertise windows from hundreds of thousands to over a million tokens, yet practical quality often drops well before that ceiling.

Builder example

If your app retrieves company documents and pastes them into the prompt alongside conversation history and instructions, the total can silently exceed the window. The model either truncates your input (losing information) or the API returns an error. Tracking token usage per call helps you catch this before users do.

You paste an entire project folder into a chat and ask a specific question. The model misses the answer because it was on page 47 of 100.

Give the model the specific file or section it needs, not the entire folder. Less clutter, better answers.

Common confusion: A larger context window means the model can accept more input. It does not guarantee the model will pay equal attention to all of it. Quality declines as the window fills up.