Meli AI Mentoring

5.7

They match how much they delegate to how much the work matters

Chapter Progress: Early Draft

Chapter Progress

Agent instructionsPaid book

The more advanced a control system is, so the more crucial may be the contribution of the human operator.

— Lisanne Bainbridge (1983) — Ironies of Automation

The better AI handles the routine, the more your judgment decides the result

Here is a turn most people do not expect. As AI takes over the routine parts of a task, the parts that are left are the hard ones: the call that needs taste, the fact that needs checking, the tone that has to fit a particular person. Automation does not erase the human job, it concentrates it. The work that remains is rarer and weighs more, because the easy steps are gone and what stays is the steps that decide whether the result is any good.

That concentration sets up a trap. The skill you use to judge the model's output is a skill like any other, and a skill you stop practicing tends to fade. Hand a task over completely for long enough and the very ability you would need to catch a bad result can get dull from disuse. The model does not get worse, your ability to evaluate it often does. So the question is not how much you can delegate. It is which tasks keep a person's judgment exercised where that judgment shapes the outcome, and which you let go of entirely because little rides on them. Where the model's judgment is already stronger, the better move is to push the system toward that quality and build a check that runs without you, not to hold the task back out of habit.

The rest of this subchapter gives you one decision rule and the modes to act on it: match how much you delegate to how much the work matters. Full delegation where a wrong answer costs little. Close collaboration where a person's judgment still produces the better result, with the aim of the best aligned outcome from whichever judgment is stronger, yours or the model's, rather than keeping a human in the seat for its own sake.

Hand-drawn delegation map showing routine labor executed by AI, a review checkpoint, human judgment, responsibility, and skill atrophy risk. — Routine labor moves to AI, and your judgment stays practiced by passing through a review checkpoint instead of around it.

Delegating the labor still leaves you holding the call

Picture the shift in concrete terms. When AI drafts the email, summarizes the meeting, or prepares the analysis, your job is no longer to produce every line. It becomes catching what is wrong, missing, overstated, thin on evidence, or off-key for the person who will read it. That sounds lighter than writing it yourself, and it can be harder. Much of that catching can be handed off too: a judge prompt that scores each draft against your standard, or a second model asked to find the overstatements and missing evidence, will surface most of the routine slips before you ever read the output. What stays yours is the final call on the cases that need taste, relationship context, or a standard the system has not learned yet, and the less that judgment is exercised, the harder those calls get.

So delegate the labor freely and keep the judgment exercised. Let the model carry the typing, the formatting, the first pass. Keep doing, often enough that it stays in your hands, the part where you read the result against what you wanted and decide whether it clears the bar. The labor is the part that does not make you wiser to repeat. The judging is the part that does. You can hand off the first and still own the second, and owning the second is what lets the call rest on something you can stand behind.

Full delegation quietly dulls the skill you review with

There is a hidden cost to handing a task over completely, and it shows up slowly enough to miss. Run the same delegation forward in time and watch what happens to your attention. The first time you ask AI to write a client email, you read every sentence, because you do not trust it yet. After three months of letting it draft your emails, you start skimming. After six months, you might not notice that the model softened a risk you meant to flag, misstated a deadline, or struck the wrong tone for a delicate situation. You did not decide to stop checking. The checking just thinned out, one busy day at a time.

That fade is the predictable result of not practicing. Writing a good client email draws on a few skills working together: reading for tone, checking claims against what you remember, calibrating how direct to be for this particular relationship. Stop exercising those skills and they weaken, the way any unused skill does. The danger is not that the model declines. It is that your ability to tell a good draft from a flawed one declines, while the drafts keep arriving as fluent and confident as ever.

The prose above has built the felt idea: a skill you delegate away stops being practiced, and an unpracticed skill quietly weakens. Here is the name for it.

Confidence in the tool and skill in yourself pull in opposite directions

A study from Microsoft Research and Carnegie Mellon found a pattern worth holding onto. The more confidence people placed in the AI, the less critical thinking they applied to its output. The more confidence they had in their own domain skill, the more critical thinking they applied. Put those two together and you get a quiet spiral: leaning on AI raises trust in the AI and cuts practice of the underlying skill, so over time you grow more trusting and less able to check what you are trusting at the same moment.

The way out is not to trust the model less across the board. It is to build something into your workflow that surfaces the gap between what you trust and what you can verify, so the spiral has a tripwire. The learning loops from the chapter on building learning loops around your workflows do this when you set them up to flag the places where your judgment and the model's disagree.

Choosing the collaboration mode is choosing which skill you keep

The fix is a routing rule, not a rule against delegating. Full delegation fits routine, low-stakes, well-defined work, the tasks where a wrong answer is cheap to catch and cheap to fix. Close collaboration fits work where a person's judgment currently shapes the result more than the model's does, where staying in the loop produces the better outcome. The skill is reading each task for which case it is, rather than turning a single delegation dial up for everything.

Experienced users often lean toward collaboration as much as toward more delegation. Anthropic's data on how Claude users mature points this way: many of the most practiced users are not simply handing off more, they are iterating more, working with the model collaboratively on harder problems. A large part of the road to power use runs through deeper collaboration on the work that matters, not only through delegating a wider and wider share of it. Power users tend to do both at once, pushing more routine work fully onto the system while going deeper on the calls that reward it.

It helps to picture more than two settings. The earlier chapter on choosing the collaboration mode gives you a useful starting split, directing the work yourself versus delegating it. In practice the choice spreads across four modes, and the simplest way to hold them is by who does the core work and who keeps watch:

Tool use: you invoke the model for a discrete sub-step and do the core work yourself. As of 2026 this looks like asking for one paragraph or one calculation inside a task you are otherwise driving.
Pair work: continuous partnership where you stay in the loop on every decision, trading the work back and forth turn by turn.
Delegation: the model executes the whole task and you review the output after the fact.
Supervision: the model runs on its own within bounds you set, and you watch its behavior in aggregate rather than checking each result.

Each mode keeps a different skill exercised

Each of the four modes leaves a different skill practiced or unpracticed, which makes the choice a choice about your own capability and not only about speed. Tool use preserves skill, because you do the core work and the model only fills a gap. Pair work builds skill, because the friction of staying in the loop on every decision forces you to reflect as you go. Delegation risks atrophy at the task level: you stop performing the task, so the doing-skill fades. Supervision risks atrophy one level up, at the judgment level: you stop checking individual results, so the evaluating-skill is what fades.

Reading the modes this way changes the question. You are not asking how much to trust the model. You are asking which skill you are willing to let go slack, and on which task that is a fair trade. On a throwaway task, letting the doing-skill fade is no loss. On the work your reputation rests on, it is.

The role automation makes rarer is the role it makes worth more

The risk framing above is half the picture, and the cautious half. The other half is that the human role is changing shape, and the new shape is worth more than the old one. When agents take the first drafts, the routine analysis, and the boilerplate, the human job concentrates on the parts that need judgment, taste, and thinking at the level of the whole system. That concentration is what makes the remaining human contribution worth more per hour, on the condition that you invest in the skills the new role asks for.

Ryan Nystrom, an engineering manager at Notion, names the shift plainly: 'I view our job as engineers evolving into systems thinkers and architects.' He also points to a side effect worth borrowing. When the agent produces the first pass, judging it becomes less personal. Reviewing your own work carries a charge, a quiet 'I wrote this, do not pick it apart.' Reviewing the agent's work is a flatter question: does this meet the . The emotional weight drops, and the quality of the evaluation can rise once it is no longer about defending your own draft.

Nystrom also leaves you a way to check your own setup. A workflow that is genuinely better should feel more relaxing, more enjoyable, and more productive at the same time. If yours feels like a trade instead, faster output at lower quality, or higher quality bought with exhaustion, the design needs work, not your willpower. When a workflow does land on all three, the defensive posture this subchapter describes loosens on its own. You keep your judgment exercised by staying in the architect seat: designing the system, writing the , evaluating the output, and owning the calls where your knowledge and taste produce the better result. The seat is a playful one too. A lot of the sharpest checks start as a wouldn't it be interesting if I asked the model for a wilder version, or had it attack its own draft, or invented a harder test case than the one in front of me. That experimenting is where your taste stays awake, and the best of those one-off probes is worth folding into the standard so it runs on every future draft.

Four failure modes follow from delegating without checking

When delegation runs ahead of review, the trouble tends to arrive in one of four shapes. Naming them turns a vague worry into a short checklist you can run against any workflow. The table below pairs each failure with how it builds and the move that keeps it from building.

Comparison

Risk	How it develops	How to prevent it
Skill atrophy	You stop practicing a skill, your ability to evaluate AI output on that skill declines, and errors start slipping through unnoticed.	Keep the task collaborative where your judgment matters, and stay in the loop instead of skimming.
Overtrust	AI output is fluent and confident. Over time you stop questioning it and start treating a draft as a fact.	Run periodic stress tests, from the chapter on testing your tools at the edge of their capacity, and always verify high-stakes claims.
Invisible errors	The model makes a subtle mistake that reads as correct, and you miss it because you are reviewing rather than co-creating.	Run important deliverables through the workflow from the chapter on running important output through a second model.
Context loss	You delegate so much context to AI that you lose the thread of your own work and can no longer explain your own outputs.	Keep your own intent and reasoning written down beside the AI's outputs, by hand or with a prompt that logs them. The AI's reasoning is not your reasoning.

Doubt the evidence too, including this book's numbers

Keeping a clear standard includes a habit that is easy to skip: doubting the evidence, including the evidence in this book. The research behind these claims has limits, and naming them is part of the discipline, not a footnote to it. A reader who takes every cited number at face value has handed the call to the page, which is the exact move this subchapter warns against.

The cleanest productivity numbers are smaller than the loudest ones

Many self-reported productivity gains are noisy. The roughly 30 minutes a day that AI users report saving (Zoom and Morning Consult survey), the 1.5 days a week EY's Work Reimagined Survey attributes to its small set of advanced users, and the 3x improvement Mollick notes workers self-report on about a fifth of their tasks all come from user surveys, not controlled measurement. The best-controlled data sits lower and is worth more: the BCG study and the customer-support study by Brynjolfsson, Li, and Raymond show smaller but well-measured effects, roughly 12 to 40 percent on quality and speed for specific tasks, and a 14 to 15 percent average productivity gain that climbs to 34 percent for novices.

The same caution applies to headline tenure effects, the reports that say longer-term users succeed more often than newer ones. A raw gap like that is an unadjusted association, and the controlled advantage tends to shrink once the analysis holds other factors fixed, so the headline reads larger than the measured effect. Survivorship bias matters here too: users who could not get value likely stopped early, which inflates the apparent skill of the ones who remained. When a number like this carries weight in your decision, find the underlying report and read its own regression table rather than the press summary.

Much of the sharpest writing on this topic comes from people who sell AI tools. The behavioral patterns hold up across independent sources, which raises confidence, and the same selection still tilts toward optimistic stories. The steadiest data you have is your own experience, which is why the stress tests, the verification habits, and the practices in this book exist: to keep that source reliable.

Keeping your judgment sharp leans on the same articulation skill the chapter on following up until the output matches teaches. Staying articulate about what you want, why you want it, and what good looks like is how your evaluating ability stays sharp even as you delegate more. The power user whose judgment keeps improving is the one who keeps communicating precisely with the model, holding a clear standard, rather than letting the model's fluent defaults quietly set the bar.

One line keeps this from tipping into busywork: remove the effort that does not make you wiser, and keep the effort that builds judgment. Some cognitive work is waste to repeat, reformatting, hunting for files, copying between tools, and delegating it costs you nothing. Some cognitive work is how taste, skill, and supervision grow, and delegating it costs you the growth. The aim is not to keep a human in charge for its own sake. It is to get the best aligned result from whichever judgment is stronger on the task, yours or the model's, and to keep both improving. The chapter on building learning loops around your workflows scales how you evaluate quality. The deepest move is not to catch each drifting criterion by hand: it is to evolve the system, setting up an evaluation or a self-critique prompt that flags when your standard and the model's output have drifted apart, then encoding the new criterion into a standard, rule, or judge prompt that catches it on every future run. Each pass hands more of the noticing to the system and frees your attention for the next level up.

Match the mode to the stakes, and keep the call yours

Delegate the labor and keep the judging where the work matters. Full delegation suits routine, low-stakes tasks, and collaboration suits the work a person's judgment shapes, because a skill you stop practicing is a skill you can no longer use to check the model. Match the mode to the stakes, watch for the four failures that follow from delegating without checking, and aim for the best aligned result from the stronger judgment, yours or the model's, while you stay accountable for the call. This routing outlives any one interface. Whether you steer through a chat box with Claude Fable in 2026, talk to earbuds and glasses, or one day think a request through a brain-computer link into a system far smarter than you, the same question holds: where does staying involved produce the better aligned result, and where can you hand the rest off and build a check that runs without you.

Audit your workflows and reset the modes that have drifted

Audit your workflows for delegation riskPaid book · Claude interviews you about your AI-assisted tasks, rates each one for stakes and judgment, and recommends which workflows to keep collaborative. You decide which mode each task gets.

References

5 sources

1
Ironies of Automation
Lisanne Bainbridge · 1983 · Automatica 19(6), 775-779, 1983.
Bainbridge identified a central paradox of automation: the more advanced an automated system becomes, the more crucial the human operator's contribution becomes. Automation changes the human job into rarer, higher-stakes supervision. The operator must monitor for failures they rarely encounter, maintain skills they seldom practice, and intervene at exactly the moments when the system's behavior is least predictable.
This 1983 insight applies directly to AI-assisted knowledge work. As AI handles routine drafting, analysis, and research, your role shifts toward the same pattern Bainbridge described: monitoring output you did not produce, catching errors in domains where your practice is declining, and making judgment calls at exactly the moments when the AI's behavior is hardest to predict.
2
Endoscopist deskilling risk after exposure to artificial intelligence in colonoscopy
Budzyń, Romańczyk, Kitala et al. · 2025 · The Lancet Gastroenterology & Hepatology, 2025.
In a multicenter observational study, the adenoma detection rate in standard non-AI-assisted colonoscopies fell from 28.4% before AI exposure to 22.4% after exposure. The result is a concrete warning sign: it does not prove that AI caused every part of the decline, and it does not directly measure knowledge-work deskilling.
Medicine provides one of the clearest measurable examples of the supervision problem: regular AI assistance can change the skill, attention, and vigilance needed to perform or evaluate the task unaided. For knowledge work, the practical lesson is to keep high-judgment tasks collaborative and periodically practice the underlying skill without AI assistance.
3
The Impact of Generative AI on Critical Thinking
Lee, Sarkar, Tankelevitch et al. · 2025 · CHI 2025. Microsoft Research and Carnegie Mellon.
Higher confidence in AI predicted less critical thinking. Higher self-confidence in one's own domain skills predicted more critical thinking. The combination creates a self-perpetuating spiral: as AI use increases confidence in AI and decreases practice of underlying skills, the user becomes simultaneously more trusting and less capable of evaluating what they are trusting.
This is the confidence-skill spiral. The intervention is to build feedback into your workflow that surfaces the gap between what you trust and what you can verify. The learning loops from the learning-loop chapter serve this function when they include disagreement analysis between your judgment and the AI's.
4
Cyborgs, Centaurs and Self-Automators: The Three Modes of Human-GenAI Knowledge Work
Randazzo, Lifshitz-Assaf, Kellogg et al. · 2025 · HBS Working Paper 26-036, 2025. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4921696
Self-Automators, who handed entire tasks to AI, were faster, but their learning suffered. The study raises long-term skill-atrophy concerns for workers who default to full delegation on tasks that require judgment.
Speed and productivity gains from full delegation come with a hidden cost: degraded ability to evaluate and improve AI output over time.
5
Spec-driven development: the AI engineering workflow at Notion
Ryan Nystrom · 2026 · How I AI podcast (hosted by Claire Vo), 2026. https://www.youtube.com/watch?v=pUHA_jNwuYE
Nystrom describes the emotional and professional experience of working with AI agents as 'more relaxing, more fun, and getting more done.' He frames the shift as an evolution from maker to architect, where the human role gains value by concentrating on systems-level decisions.
This provides the positive complement to the defensive framing of judgment protection. Power users who internalize the architect identity stop feeling threatened by AI capability and start feeling amplified by it.