Meli AI Mentoring

1.6

They transform every failure into a reusable specification or standard

Chapter Progress: Early Draft

Chapter Progress

Agent instructionsPaid book

The practitioner allows himself to experience surprise, puzzlement, or confusion in a situation which he finds uncertain or unique.

— Donald Schön (1983) — The Reflective Practitioner

Read a frustrating output as a specification you have not written yet

Skip this step and the same failure costs you twice: once when the output disappoints you, and again next month when an almost identical request disappoints you the same way. The fix lived in your head, so it could not travel to the next prompt. Most AI failures trace back to a gap between what you meant and what the model received.

Patience with repair pays off only while the loop is converging. When two clean corrections do not settle it, the move is to change the approach rather than keep pushing the same prompt.

Take a recent frustrating AI interaction, diagnose it with five questions, name the single gap that did the most damage, and rewrite the exchange as one saved specification your prompts can reuse, so the next run starts a level up instead of from scratch.

A failure is a specification trying to be written. The frustration you feel when AI output misses the mark usually points at something you assumed and did not say: audience, tone, evidence bar, order of operations, constraint priority, or task boundary. That unstated boundary is the specification. Every serious interaction hands you two things, the output you asked for and evidence about the system that produced it, and the frustrating output is the loudest piece of that evidence. Write the missing specification down and save it as a rule your prompts reuse, and a one-time irritation stops repeating, because your saved prompts apply the fix on every later run.

Hand-drawn process diagram showing an AI failure being diagnosed for context gap, criteria, and task boundary before becoming a reusable specification for the next run. — A frustrating output becomes useful when you can name the missing specification it exposed.

A workflow that works once will drift

Renee builds a workflow for weekly client updates. She loads context, sets constraints, and gives an example of the tone she wants. The first output is excellent: specific, concise, exactly right for her client. She saves the prompt. The next week she swaps in new project notes and runs the same prompt for a different client. The output ignores two constraints and falls back on a generic corporate tone. She rewrites the prompt and it improves. A month later a model update changes how the prompt behaves, and the careful workflow she built now produces slightly off results, so she cannot tell whether the problem is her prompt, the new model, or something she forgot to specify.

Renee's experience is the ordinary cost of working with a powerful system whose strengths are uneven, whose behavior can shift with each update, and whose reading of your instructions depends on context you may not realize you left out. What keeps Renee improving past this point is that she treats the failure as material: something to diagnose, something to repair, and something to turn into a rule the system reuses. That posture runs on more than patience. It mixes curiosity about why the output drifted, a willingness to experiment with the setup, and the discipline to encode what she learns so she never has to relearn it. The compounding-loop chapter introduced this unevenness as the ; here you build the part of the loop that takes a single failure, writes the fix, and saves it so the system climbs a level.

Make your tacit criteria explicit so the model can meet them

The gap between what you meant and what the model received has a common shape. You know which unstated preference matters, what 'good' means in your situation, and which exception is obvious in your world. The model may not have access to those tacit criteria unless you supply them. It fills the gaps with reasonable defaults, and when those defaults miss, the output can feel unintelligent even though the model was doing exactly what it was told.

'Make this more professional' carries a different meaning for every reader

You ask AI to make a draft 'more professional.' The model returns something stiff, formal, and corporate. You are annoyed, because you meant warm, concise, credible, and free of sales language. The model optimized for a reasonable reading of 'professional' that happened to be wrong for your audience.

The deeper issue is that 'professional' was a compressed instruction. It assumed the model would infer your taste, your audience's expectations, and the specific tone that creates credibility in your field. The repair is to become more articulate about your own criteria. The more precisely you describe what 'good' looks like, the less the model has to guess, and increasingly capable models can help you surface a preference you could not name on your own.

This builds on the context-loading skill from earlier in the book, working at a more mature level. Beginning is supplying role, audience, and goal; mature context loading is making your tacit criteria explicit. Tacit criteria are the preferences, exceptions, and quality standards that feel obvious to you and stay invisible to anyone, or anything, that has not worked alongside you for months. As your saved criteria accumulate, a loose request lands against them, so even an underspecified prompt can come back closer to what you meant.

Pause at the irritation and ask what it is pointing at

When an AI interaction frustrates you, something useful is happening beneath the annoyance. The frustration is almost always pointing at a specific category of problem: something underspecified, something mis-sequenced, something over-delegated, or something outside what the model does reliably right now. The annoyance is the cue to slow down for a moment and read which category you are in.

Power users pause at the moment of irritation and ask five diagnostic questions. You do not have to run them by hand. You can paste the failed exchange into the model and have it work through the questions with you, then judge its answers against what you meant. The model compares your prompt against the output and proposes where the gap is; you decide whether it is right.

The five diagnostic questions sort almost every failure into its cause.

What did I assume the model understood? Unstated assumptions are the most common source of failure. The model cannot read between the lines of your prompt the way a colleague who knows your work can.
What did it optimize for instead of what I wanted? The model will still prioritize some reading of the request, which may not match the one you intended. Naming what it prioritized shows you what your prompt communicated versus what you meant.
Did I give it an example of good output? An example closes the gap between your criteria and the model's reading of them faster than any other input.
Did I ask for too many things at once? A task that overloads a single prompt tends to split the model's attention, so you get shallow work across every dimension instead of depth on the one that mattered.
Is this a prompt problem, a model problem, or a task-boundary problem? Some failures clear with a better prompt. Some need a more capable model. Some need you to do part of the work yourself. The diagnosis decides the repair.

Run these five questions on each failure and you stop stalling at the frustration and start reading it for what to fix next. Every failed interaction teaches you something about how you communicate with AI. Over a few weeks you develop a faster sense for which category of failure you are looking at, and your repairs get shorter.

Bad AI output rarely has a single cause. The safety researcher James Reason studied this pattern in complex systems and described it with an image: picture each layer of a system as a slice of Swiss cheese, each slice with small holes. Your prompt might be vague (one hole). The context might be incomplete (another hole). The task might sit at the edge of what the model does reliably (another hole). A bad result gets through when holes in several layers line up at once. The practical consequence is forgiving: you rarely have to fix everything. Find the most fixable hole, close it with a written specification, and the output often improves enough to be useful.

Prompts are graded, not pass-fail, so closing one gap usually shifts the result

Reason built the Swiss-cheese image for physical safety systems, where a defense either holds or fails: a valve is open or shut, a guard is present or missing. Prompts and context are graded rather than binary. A prompt can be a little vague or very vague; context can be thin or nearly complete. So when you close the most fixable hole, you are usually not flipping the whole result from broken to fixed. You are nudging a graded system enough that the output crosses the line into useful.

This is why the diagnosis points you to where to look first, not to a rule that one fix is always enough. If the most fixable hole turns out to carry most of the damage, one specification clears the run. If the damage is spread across several thin layers, you close them in order of how much each one costs you, retesting after each. The beginner move is the same either way: fix the cheapest hole first and see how far the result moves.

Walk one real failure from weak output to saved standard

Diagnosis is easier to trust once you have watched it run from start to finish. Here is one full repair on the example this book returns to: getting AI to write you a useful morning brief, a short rundown of your day, as one piece of a personalized setup tailored to how you live and work. As of 2026 that looks like describing what you want to a chat model and refining what it gives back. The interface will change, to voice, to glasses that carry world knowledge, to something we have not named yet. The repair discipline will not.

The request. You are setting up a morning brief. You tell the AI: 'Each morning, give me a short summary of my day so I can plan.'

The weak output. Back comes a tidy paragraph: a motivational line about seizing the day, a generic list of 'review your priorities' and 'stay hydrated,' and a closing nudge to 'make it a great one.' None of it touches your calendar, your inbox, or the one client call you are nervous about. It reads like a brochure for someone's morning, not yours.

The diagnostic questions. You paste the exchange back and ask the model to run the five questions. The answers converge fast. What did you assume it understood? That 'my day' meant your real schedule, not a generic template. What did it optimize for? Sounding upbeat and complete, because nothing told it what a useful brief contains. Did you give it an example of good output? You did not, so it had nothing concrete to match. Too many tasks at once? No, the request was narrow. Is this a prompt, model, or task-boundary problem? A prompt-and-context problem: the model never received the distinction that would have made the brief yours.

The missing distinction. A brief you can plan from is built from your specifics, not from advice about mornings in general. You wanted the three things on your calendar before noon, anything in your inbox that needs a reply today, and a one-line flag on the meeting you are dreading. The word 'summary' carried none of that, so the model filled the space with pleasant filler.

The corrected instruction. You rewrite it as a specification: 'Each morning, read my calendar and inbox. Give me my first three time-bound commitments before noon, list any message that needs a reply today, and name the one item on the calendar I am most likely to put off with a single sentence on why. No motivational language. If my calendar is empty, say so instead of inventing a schedule.'

The improved result. Now the brief names your 9:30 client call, flags the two emails waiting on you, and notes that the call follows a delayed invoice, so you walk in prepared. The repair is not the prize; the saved standard is. You add those lines to the brief's reusable instructions, so tomorrow's brief starts from the specification instead of from a blank request, and the next setup you build for yourself inherits the same rule: name my specifics, drop the filler.

Apply the principle to your own setup instead of rebuilding it each morning

That brief is one piece of the larger contrast. The non-application of this principle treats each failure as a one-time annoyance: the brief disappoints you, you hand-edit it that morning, and tomorrow you ask again and edit again, because the correction never left your head. The setup never improves; it only gets patched daily. Apply the principle and you fix each failure once, then write the rule that prevents it, so the system keeps the correction. When the calendar sync misses an all-day event, you fix today's brief and also write the rule that all-day events count, then save it. Over a few weeks of small repairs, your personalized setup stops being something you rebuild every morning and becomes something that already knows how you want your mornings to read.

Schedule a short review so saved prompts do not drift

Growing AI capability changes the upkeep on your saved work rather than removing it. Saved prompts, project standards, and automated workflows compound in value, and they are also living systems that need maintenance. A prompt that worked beautifully on one model can become too restrictive, too vague, or oddly misaligned on a newer one. A can accumulate instructions that conflict with each other. An automation can degrade quietly when a tool it depends on changes.

Power users build a light maintenance rhythm: a short quarterly review that keeps their saved prompts, standards, and workflows current. A kept standard is only worth keeping while it still holds, so the review is where you test that it does. What is worth checking sits in one family of drift.

: a prompt that produced excellent results three months ago can produce mediocre ones today because the model reads the same instructions differently. When you move to a new model, have AI re-run the prompts you rely on against a saved example of good output and flag any that drift.
bloat: over time a project gathers standards, reference documents, and rules added for specific situations, and some of them start to conflict. Ask the model to scan your saved standards and rules for contradictions each quarter, so conflicts surface before they produce confusing output.
Stale automation: a workflow that chains several AI steps can break quietly when one model in the chain changes behavior. Build verification steps into your automations, and read the output of long-running workflows periodically instead of assuming they still work.

A grows more useful as it grows and less useful if you never sharpen or retire anything in it. The habit to build is a short quarterly review: re-test your top five prompts, delete anything you have not used in three months, and update saved standards that reference a constraint that has gone out of date. Use enough of this practice to protect the work, not so much that the upkeep becomes the work.

Correct the model the way you would redirect a capable colleague

Earlier in the book you met follow-up moves for refining output that is already close: critique, reframe, compare, simplify, deepen, fact-check, adapt, tighten, and challenge assumptions. This section addresses a deeper skill, clearing up confusion when the interaction itself is going wrong.

When the model keeps missing, the instinct is to add more words, more constraints, more instructions. Often the cleaner move is a plain statement of what went wrong, delivered the way you would redirect a capable colleague who misread the assignment. Six repair phrases cover most of those redirections, and they work because each one names the specific failure pattern.

"You answered a nearby question, not the one I meant." This redirects without starting over. It tells the model its answer was reasonable and still off target.
"You are optimizing for completeness, but I need decision usefulness." When a model leans toward covering everything, this phrase switches it from thoroughness to highlighting what matters for the choice in front of you.
"You followed the format but missed the purpose." The output looks correct structurally and misses the point. This tells the model to go deeper on intent.
"You are being too agreeable. Challenge the assumptions." When a model leans toward helpfulness and you need a thinking partner, you sometimes have to request friction explicitly.
"You are treating all constraints as equal. Here are the non-negotiables." A long prompt with many constraints invites the model to balance them. Ranking the constraints tells it where to spend its effort.
"You are making this sound polished, but I need it to be true." When a model leans toward fluent phrasing and accuracy matters more than style, say so directly.

These phrases are productive because they name the specific failure pattern instead of expressing general dissatisfaction. 'Do it better' gives the model nothing to work with. 'You optimized for completeness when I needed decision usefulness' gives it a precise course correction. The underlying skill is the same one that makes you a better collaborator with people: the ability to articulate what went wrong without abandoning the working relationship.

Reset the approach when two clean corrections do not converge

Patience with AI pays off only while the is moving toward a better result. When it stops moving, the question is not whether to give up. It is whether this task can reach the quality you need from the model you have, and if so, what would push the system there: a more capable model, a smaller task, a worked example, or context the model never received. Three signs tell you the current repair path has stopped being productive.

The same error returns after two clear corrections. If you have specified the problem twice and the model keeps producing the same failure, the task may sit outside its current capability for this prompt structure. Try a more capable model, break the task into smaller pieces, or rebuild the prompt around a worked example before deciding the model cannot reach it.
The output is technically correct and wrong for the situation. Some tasks turn on social context, organizational dynamics, or personal relationships the model has not been given. Supply that context where you can, since the gap is usually missing input rather than missing capability, and make the call yourself only where the context cannot be handed over at all.
You are spending more time repairing than producing. If five rounds of correction have not produced usable output, the efficiency case has flipped for now. Do this round yourself, and note it in your reference as a current frontier boundary so you can ask the model to try again as it improves.

Power users are patient, and they are also pragmatic. They put work into repair while the loop is converging, and they change the approach when it is not. Both decisions are skill. The underlying idea holds the section together: repair while the gap is narrowing; change the approach when the loop stops learning. A failed output deserves diagnosis before you abandon it, and two clear corrections that produce no visible convergence are the cue to ask a sharper question: can this be done by AI at all right now, and if it can, what change to the model, the task, the example, or the context gets it there. Where the answer is a clear no for today, you handle this round yourself and leave a note to revisit it, because the boundary keeps moving.

Diagnose and repair one frustrating interaction this week

Diagnose and repair a frustrating AI interactionPaid book · Claude diagnoses why a recent AI interaction failed and builds a reusable repair sequence. You review the diagnosis and save the correction pattern.

References

1 source

1
Human Error
James Reason · 1990 · Cambridge University Press, 1990; and 'The contribution of latent human failures to the breakdown of complex systems.' Philosophical Transactions of the Royal Society B, 1990.
Reason argued that accidents in complex systems are rarely caused by one mistake. They occur when several latent weaknesses, defenses with gaps in them, line up so that a hazard passes through all of them at once. He pictured the defenses as stacked slices of Swiss cheese: each slice has holes, and harm reaches the end only when the holes momentarily align.
A frustrating AI output is the aligned-hole moment: a vague prompt, thin context, and a task near the model's edge combine to let a bad result through. You do not have to repair all three. Diagnose which hole did the most damage, close that one with a written specification, and the next run usually clears.