Long context

Models that can accept very large inputs, often hundreds of thousands or millions of tokens, letting you pass in entire books, codebases, or document collections at once.

A long-context model can take an entire codebase or a year of meeting transcripts as input and answer questions about them. This sounds like a silver bullet: just paste everything in and ask. The advertised token limit is a ceiling for what fits, though, not a guarantee of quality. Research consistently shows models miss information buried in the middle of long inputs, even when the total length is well within the stated limit. Long context is powerful when combined with good information structure, and unreliable when treated as a substitute for it.

Builder example

If you are considering replacing your document search pipeline with "just paste it all in," test first. Feed your model a long document, embed a specific fact at various positions, and ask about it. Many teams discover that a retrieval pipeline (which selects and ranks the most relevant pieces) outperforms brute-force long-context on accuracy while costing less per call.

You paste a two-hour meeting transcript into a chat and ask what was decided about the budget. The decision was in the middle, and the model misses it.

Search for the relevant section first, then ask about that section specifically. Targeted questions get better answers than broad ones over large inputs.

Common confusion: Long context means the model can read more in one sitting. It does not mean the model remembers anything across sessions. Each call starts fresh unless your application explicitly stores and reloads information.