Meli AI Mentoring

5.5

They go beyond chat to agents that read, write, search, build, and take action

Chapter Progress: Early Draft

Chapter Progress

Agent instructionsPaid book

Men will set the goals, formulate the hypotheses, determine the criteria, and perform the evaluations.

— J. C. R. Licklider (1960) — Man-Computer Symbiosis

You ask AI to help you tidy up a folder of notes, it writes back a clear plan for how it would group them, and then nothing changes on your disk. The plan is good. You still have to open the folder yourself and drag every file into place by hand. The previous chapters had you saving prompts and building workspaces so each conversation starts higher, and each of those conversations still ends the same way: with text in a window that you then carry somewhere else by hand. This subchapter is about the moment that stops being the ceiling, when the same instruction can move the files instead of only describing the move.

An agent can act on your files, and you stay accountable for the call

Every task you bring to AI splits into two kinds of work. There is the busy part and the deciding part. The busy part is the handling: searching, formatting, organizing files, moving data from one place to another. The deciding part is the judgment: what counts as a good result, which grouping makes sense, which trade-off to take. A can compress the busy part almost to nothing, and it can propose strong calls on the deciding part too. Your job is to aim at the best aligned outcome and own the call you ship, using whichever judgment is better on that particular decision, yours or the agent's, and sharpening both over time.

An agent that acts carries more risk than one that only writes text. A chat model that drafts a weak paragraph costs you a minute. An agent that runs a bad plan can overwrite files or change things that are slow to undo. So the capability jump from chat to agent comes with a matching jump in care: the first tasks you give an agent should be ones you can fully reverse.

By the end of this subchapter you will have run one real task through a , even if you have never written code, and named in one sentence what the agent did that a chat model could not.

Hand-drawn action boundary diagram showing chat suggestions separated from an agent that reads files, writes changes, and runs commands behind boundaries and review. — The jump from chat to agent is a jump from suggestions to action, so boundaries and review become part of the work.

Chat hands you text; an agent changes the world outside the window

Hold the difference in your hand with one task you can picture: you have two hundred meeting notes piling up in a single folder, and you want them sorted by project with a short index at the top. A chat model can plan that sort beautifully and cannot do it. You describe the mess, it gives you a clean scheme of subfolders and a list of which note goes where, and the work of creating those folders and moving the files is still entirely yours. Every chat conversation ends this way: the result is text, and to use it you copy it somewhere else, into an email, a document, a spreadsheet, a set of folders.

An agent closes that last gap. It can read your two hundred notes, find the recurring projects on its own, create the subfolders, move each file into the right one, and write the index, all inside the folder you pointed it at. The output is not a description of the sort. It is the sorted folder. As of 2026 you do this by typing instructions to a tool like Claude Code, Codex, or Cursor, and the interface will keep changing under you, to voice, then to glasses, then to whatever comes after. The thing that does not change is the shape of the move: the agent takes action in your working environment and leaves a result that lives outside the conversation.

The prose above has built the felt difference. Here is the name for the family of tools that crosses it.

As of 2026, the distance between what you can finish in a chat window and what you can finish with a is one of the wide capability gaps in everyday AI use. Power users have already crossed it. The local lens here comes from Tal Raviv and Aman Khan, who argue in their Lenny's Newsletter essay 'How to build AI product sense' that even non-engineers who use coding agents like Cursor and Claude Code for daily work develop a feel for what AI can and cannot do faster than through any other activity. Directing an agent to build something teaches you AI from the inside, because you watch your instructions turn into action and see exactly where they were too vague.

You do not need to write code to direct a

The name '' scares off the people who would gain the most from one. These tools accept plain language, and code is only the means they use to act. You say what you want in ordinary words. The agent works out how to do it, takes the steps, and shows you the result. You review and then approve, reject, or redirect. The folder-of-notes task asks for no programming on your part at all, and it is squarely the kind of job a is built to handle.

The everyday tasks an agent handles well sort into three families, which makes them easier to hold than a flat list. The first is putting things in order: reorganizing a folder of notes into subfolders by project with an index, restructuring scattered files, cleaning up a collection that has grown rough. The second is turning a pile of input into a structured output: reading every PDF in a folder and building a spreadsheet with one row per document, extracting the key finding from each, pulling a batch of documents into a single summary. The third is building a small tool for yourself: a simple page that shows your weekly goals and reading list, a tracker for the decisions you owe this week, a tiny utility no off-the-shelf product quite covers. Across all three, you describe the job in plain language and the agent supplies the doing.

Place your own tools on a capability spectrum, and the jump you are making becomes concrete. The table below names three levels. As of 2026 the tools in the right column are current examples; treat them as a dated snapshot, since the levels will outlast the brand names.

Comparison

Level	What it can do	Tools you might use (2026 snapshot)
Chat (text in, text out)	Converse, draft, analyze, summarize, brainstorm. The output stays in the conversation window, and you carry it elsewhere by hand.	Claude.ai, ChatGPT, Gemini chat
Agentic chat (text in, action out)	Everything chat does, plus reading files, searching the web, using tools, and running multi-step plans. The output can be files, searches, or structured data.	Claude with connectors, ChatGPT with tools, Gemini with extensions
Coding agents (reviewable action in a bounded )	Everything agentic chat does, plus writing and editing files, running commands, building small applications, chaining tasks, and operating across your project. You review and approve each action.	Claude Code, Codex, Cursor, Windsurf

Most people start on level one, and the move that changes the most is the move to level three. Going from chat to agentic chat is incremental: the same conversation surface gains a few new abilities. Going to a is a different kind of change, because the agent now operates in your working environment, reads your actual files, and produces results that live outside the window. The folder gets sorted; the spreadsheet gets written; the small tool runs.

Once AI can act, the boundary does as much work as the prompt

Crossing to an agent changes the risk, so it changes the discipline. The durable principle is bounded action under supervision. Once AI can change things outside the chat window, by reading files, writing them, running commands, sending messages, a weak result is no longer just disappointing text. A chat model that writes a bad paragraph wastes a minute. An agent that runs a bad plan on the wrong folder can move or overwrite files in ways that take real effort to undo. The reach grew, so the supervision has to grow with it.

The instinct most people reach for is the wrong one, and seeing why sets up the right one. The tempting move is a shorter leash: hand the agent the narrowest permissions possible and let it touch almost nothing. That cripples the very capability you came for. Full agentic work often needs full access, because reading, writing, and running commands are how the agent does anything useful, and a tightly restricted agent on your real folder does little well. The safety cannot come from how little the agent is allowed to touch.

It comes from where the agent runs. Power users build a container rather than a shorter leash: they give the agent full permissions inside a contained space, and they make sure that space is cut off from anything a mistake could spread to. Give an agent a whole task inside a small, walled-off world rather than a tiny task inside your whole world. Inside the container the agent can work at full capability, and the consequences of any error stay inside the walls.

The prose above has built the idea through the folder example. Here is the formal name.

Building that container follows a short progression, and grouping it into stages keeps it from feeling like a checklist to memorize. Set up the contained space first. Make a dedicated folder, copy, or that mirrors your real project, and give the agent full access inside it, with no path to production databases, live customer data, shared repositories, or anything where a mistake spreads. The agent can do anything inside; what protects you is that the inside is sealed off. Then watch a small run before you trust a large one. Give the agent one real task in that space and review what it did: which files it read, what it changed, what it decided on its own. Small contained runs are where an agent's judgment and failure modes show up cheaply. Then widen the boundary as the agent earns it. After several clean runs, move from the throwaway copy to a working branch, then from a branch to a shared setting, each step earned by reliability in the one before. As of 2026 a common backstop for this is : a tool like Git records every change so you can undo anything the agent does, which is what makes widening the boundary safe to try.

Match the supervision to the agent's new reach, not its reputation

It is easy to relax review once an agent has impressed you a few times, and that is the moment to hold the line. The reach is what sets the supervision, not how good the last few runs looked. Read each proposed action before you approve it. Keep the agent's scope narrow until you have enough runs behind you to widen it on purpose. The capability is large, and so is the cost of handing broad authority to a system whose failure modes you are still learning. Caution here is not distrust of the agent. It is matching how much you check to how much an error would cost, the same the chapter on matching delegation to stakes develops in full. The aim cuts both ways: use enough boundary and review to protect the work, not so much process that guarding the agent becomes the work, so the container stays sized to what an error would cost.

Two habits keep the container honest as the work gets real. The operating order is read-only before read-write, draft before send, local before shared, reversible before irreversible. Let an agent look before it changes, draft before it sends, work on your own copy before a shared one, and start where you can undo. And when the agent does change shared work, read the change itself rather than trusting its summary. An agent will describe what it intended; the actual difference shows what it did. As of 2026 you read that difference as a , the side-by-side of before and after, and the principle holds whatever the future interface calls it: trust the record of what changed over the agent's account of it.

Persistent instruction files carry your standards into every run

There is a quiet way to stop repeating yourself to an agent. Power users keep their standing instructions in a file the agent reads on its own, so the rules travel with the project instead of living in your memory. As of 2026 Claude Code loads a file named CLAUDE.md and Codex loads AGENTS.md at the start of a session; the names will change, the move will not. Write down once how you want your notes named, which folders are off limits, what a good index looks like, and the agent reads those standards into its context each run without being told again. The file guides the agent rather than forcing it, so how clearly you write the instructions still shapes how well they are followed, and a quick review of the result is still worth the minute. This is the same compounding move the chapter on saving reusable AI assets teaches, pointed at an agent: set up the file that briefs the agent so you do not re-brief it by hand.

Directing an agent to build something teaches you AI from the inside

As of 2026, a growing number of non-engineers are building their own small tools, automations, and applications by directing coding agents. The practice has a nickname, '': you describe what you want in plain language and let the agent build it. A lot of this starts as play, a 'wouldn't it be cool if I had a little page that tracked this' that you try on a whim, and the agent can widen the play by proposing versions you had not imagined. The results are functional and personal, and they often do exactly what no product on the shelf quite does, because they are shaped to one person's setup.

This is the product-sense effect the local lens named earlier, seen up close. Building something with an agent teaches you how AI works in a way that talking to it often does not. When you direct an agent to build, the run itself shows where your instructions were too thin, where the agent needed a constraint you forgot to state, where one more example would have saved a round, and you can ask the agent to point out which parts of your request it had to guess at. You learn about context, constraints, , and quality control by shipping a thing that either works or does not, which is sharper feedback than a chat reply that is merely fine. The folder you set out to sort can be your first build: a real task, fully reversible, that shows you the gap between chat and agent by closing it.

An agent acts; the boundary and the review are how you stay in charge

Chat hands you text and stops; an agent reads, writes, searches, runs, and leaves a finished result in your working environment, and as of 2026 that jump is one of the wide capability gaps in everyday AI. You do not need to write code to make it. Because an agent acts, the safety comes from a container, not a shorter leash: full capability inside a sealed space, small runs before large ones, a boundary that widens as the agent earns it, and a review of the actual change rather than its summary. You aim at the best aligned outcome and own the call you ship. Then comes the move that keeps the gains: encode the run into a reusable prompt and a standards file the agent reads on its own, so the next task starts a level higher and the system, not just the result, improves.

Run one contained agentic task and encode what it taught you

Complete your first agentic taskPaid book · Claude helps you scope a contained, reversible task for a coding agent, then walks you through the run with review checkpoints. You approve each action and reflect on the capability gap.