Meli AI Mentoring

3.3

They run important output through a second model because a model rarely edits its own blind spots well

Chapter Progress: Early Draft

Chapter Progress

Agent instructionsPaid book

The first principle is that you must not fool yourself, and you are the easiest person to fool.

— Richard Feynman (1974) — Cargo Cult Science

You ask AI for a draft, read it over, and it looks fine. Then you ask the same AI whether anything is wrong with it, and it tells you the draft is strong with a few small things you might polish. You ship it, and a reader spots the vague claim or the conclusion that does not quite follow. The check you ran felt like a second opinion. It was the same opinion asked twice. This subchapter is about why that happens and what to do instead: hand important work to a reviewer whose blind spots are not the same as the writer's.

A model is the wrong reviewer for its own work

Several forces push in the same direction here. The same patterns that produced an output also shape how that output gets judged. A model writes the way it writes because of how it was trained, and it reviews the way it reviews for the same reason. Ask it to critique its own draft and it tends to defend the choices it already made, because from the inside those choices look correct. The errors it cannot see in the writing are often the errors it cannot see in the review either.

A second model is built differently, so it fails differently. A reviewer with different blind spots catches what the writer normalized. Two perspectives flag errors that one perspective treats as fine, the same way a fresh pair of eyes on your own writing catches the typo you have read past five times. You do not run this on every task. It costs an extra pass, so you spend it where an error would cost you more than the pass does.

The move in one line: draft with one model, hand the draft to a different model with specific problem-finding instructions, and write down what the second model catches. As of 2026 that looks like copying text out of one paid chat model and pasting it into another, which is why keeping more than one such tool within reach is common. The interface will change, to voice, to glasses, to a brain-computer interface, to whatever follows. The move stays the same: an independent reviewer on work that matters.

Hand-drawn writer-editor workflow showing a writer model producing a draft, a separate editor model finding blind spots, and a revised output produced from the second opinion. — A second model helps because it brings a different error surface to the same draft.

Self-review recycles a model's blind spots; an independent review exposes them

Try it once and the pattern is easy to feel. Ask a model to write something, then ask the same model to critique what it just wrote. The critique tends to come back polite, general, and thin. It praises the structure, suggests a tighter word here and there, and leaves the structural problems untouched. The model defends its own choices because, from inside, they still look right. This tends to be hard to prompt your way out of, because the output and the self-critique grew from the same patterns, so they often share the same blind spots.

Hand the same draft to a different model and the review changes character. A different model carries different training, different habits, and a different set of weak spots. Each model family leans a certain way, and those leanings shift with every update, so treat them as something you test rather than something you assume. When one model reviews another's work, it catches the patterns the first model was built not to notice in itself. Hold what you observe as a working guess, not a law: run the same task across the models you have, and let an evaluation pass build the profile of what each one tends to miss, so the system, not your memory, carries which model to trust for which kind of check.

A reviewer's value is its independence, not its intelligence

What makes the second model useful is not that it is smarter. It is that its errors fall in different places. Think of two proofreaders who each miss certain kinds of mistakes. Hand a page to one and some errors slip through. Hand it to both and an error survives only if it hides in the spot where both happen to be weak, which is rarer. The gain comes from the overlap shrinking, not from either reviewer being flawless.

This is why a second model can fail a hard fact and still earn its place. It is not there to be right about everything. It is there to have a different set of weak spots from the writer, so the two of them together leave fewer gaps than either alone. When you choose a reviewer, independence from the writer matters more than raw capability.

Naming the durable principle keeps this from collapsing into a tool tip. The principle is to seek an independent error surface for work that matters: a check whose blind spots do not match the writer's. A second AI model is one way to get that. A human expert read, a source ledger you check claims against, a rubric you score the draft on, an adversarial checklist that hunts for specific failure types, these are others. The shape of the tool will keep changing. The requirement under it holds: the thing checking the work must be able to see what the thing that made the work cannot.

Give the second model one job, hunting weak writing, and it stops praising and starts finding problems

One practical setup gives the second model a single assignment: do nothing but hunt for weak writing. The editor model is told to find everything that is vague, generic, overconfident, thinly supported, or lazily written. It is not asked to improve the piece or to be kind about it. Narrowing the editor to a single job, finding problems, is what makes its feedback sharp. A model told to review broadly drifts back to the same gentle praise the writer would have given. A model told only to find what is wrong goes looking, and it finds more.

The pass runs in three steps, and it helps to see them as one handoff rather than a list to memorize: one model makes the draft, a second model marks it up, and you carry the marks back.

Draft with the first model. Give your preferred model full context, who it is writing as, who will read it, the goal, the constraints, and any source material, and let it produce the first version. This is the same briefing care the chapter on briefing AI like a collaborator develops; the editor can only catch problems the draft was set up to have.
Review with a second model. Paste that draft into a different model with explicit editor instructions. The instructions point at problems to find, not alternatives to write, so the output is a list of issues rather than a fresh draft that hides the old one.
Revise, then evolve the pass. Carry the editor's findings back to the original model and ask it to address the specific issues, or make the fixes yourself. The editor's job ends at naming the problems; the call on which problems are worth fixing stays with whoever is best placed to make it. Then take the level-up step a power user does not skip: turn the editor prompt that worked into a saved standard, a rule, or a reusable skill, so the next deliverable starts from a sharper check instead of from scratch.

Where the morning brief from earlier chapters was a setup you build for yourself, the running example here is a deliverable you send outward: a client proposal, a report, a strategy memo, the kind of document where a stray error reaches someone whose opinion of you it would dent. Watch the same draft handled the do way and the don't way to see the pass earn its cost. The don't way is the loop you may already run: ask one model for the proposal, skim it, ask that same model 'is this good?', get back 'yes, with minor polish,' and send it. The do way drafts the proposal with one model, then hands it to a second with editor instructions, which comes back flagging that the budget paragraph states a number the source material never gave and that the recommendation does not follow from the two findings above it. You and the writer model had both read past those. The independent reviewer had not seen them yet, so it could.

A good editor prompt names problems instead of producing a new draft

There is a tempting shortcut that undoes the whole pass: asking the second model to rewrite the draft. A rewrite hands you a different output; an editorial critique hands you a list of fixable problems in the one you have. When the editor rewrites, it buries the original's weaknesses inside its own writing, and you lose the thing you came for, which was a clear view of what was wrong. So the editor's job is to point, specifically, at what is weak, and to leave the fixing to you. The clearer the categories you give it, the less it falls back on the same generic praise the writer would have produced.

Hand the editor named categories and it hunts for problems instead of praising

Editor prompt: 'You are a demanding editor reviewing a draft written by a different AI model. Your job is to find problems. Be specific and direct. Flag: (1) any claim that is stated confidently but could be wrong or unverifiable, (2) any sentence that is vague enough to apply to almost any topic, (3) any paragraph that repeats an earlier point in different words, (4) any transition that sounds smooth but hides a gap in logic, (5) any conclusion that does not follow from the evidence presented. Do not rewrite the piece. List the problems with line-by-line specifics.'

The numbered categories are doing the work. They tell the editor what to hunt for, so it cannot retreat into 'this is a strong piece, here are a few small suggestions.' Take the categories out and the editor model gives you the same soft feedback the writer model would have. Each category is a different kind of weakness, and naming them makes each one findable.

The prose above has built the idea by hand: a reviewer whose weak spots differ from the writer's, pointed at named kinds of problems. Here is the formal term for what you are after.

This pass is the same separation-of-roles move the subchapter on giving each AI conversation one focused job teaches, applied one level out. There, you split research, strategy, and critique into separate conversations so no single thread carried too many jobs at once. Here you split writing from editing, and you put each role in a different model so their blind spots do not line up. Whether you split a conversation or split a model, the principle holds: each role does one job, and you weigh what comes back to reach the strongest version of the work. The point is not to keep your own judgment in charge for its own sake. It is to get the best result from whichever judgment is better on a given call, yours or a model's, while you stay accountable for what ships and keep sharpening the judgment on both sides.

Spend the second-model pass where an error would cost you, not everywhere

An independent reviewer costs an extra pass, so it is not free, and not every task earns it. The two-model pass is a tool for work that matters, not an efficiency tool for everything. Run it on the documents where an error would be expensive or embarrassing: client-facing work, research reports, important emails, anything that gets published, the strategic recommendations someone will act on. For routine internal work, iterating with one model until the output matches what you wanted, the move the subchapter on treating the first response as a measurement develops, is enough on its own.

A clean way to sort which work clears the bar is a question rather than a rule. Ask whether you would be embarrassed to have shipped this with an error you could have caught. If yes, the work is worth a second model's different eyes. If no, the extra pass is system you do not need, and the lighter trade-off applies that runs through this whole book: use enough check to protect the work, not so much that the checking becomes the work. The reviewer is there to keep the output honest, not to turn a quick task into a project.

Set up your first writer-editor pair

Edit a draft with independent problem-finding instructionsPaid book · Claude reviews a draft as a demanding independent editor and lists specific problems by category. You decide which findings to fix.