Prompt leaking / system prompt extraction

An attack that tricks the model into revealing the developer's hidden system prompt, exposing internal instructions, business logic, or sensitive configuration.

Attackers coax the system prompt out of a model using a range of techniques. Simple approaches: asking directly ('repeat your instructions verbatim') or framing the request as harmless ('summarize the context you were given'). More sophisticated methods use indirect prompt injection, where hidden instructions in external content steer the model to disclose its setup. Once exposed, the system prompt reveals the product's internal rules, safety guardrails, example data, and any secrets the developer placed there.

Builder example

If your system prompt contains API keys, database connection strings, pricing algorithms, or proprietary business logic, prompt leaking gives an attacker direct access to all of it. Even without secrets, a leaked prompt reveals your safety rules, making it easier to craft targeted jailbreaks or injections.

Common confusion: Some teams treat the system prompt as a secure location for sensitive information. The system prompt is an instruction to the model, and the model can be convinced to repeat its instructions. Treat it as visible to end users, the same way you treat client-side code in a web application.