Meli AI Mentoring

2.2

They match the input medium to the task because voice, text, and images each capture different context

Chapter Progress: Early Draft

Chapter Progress

Agent instructionsPaid book

You sit down to ask AI for help, you know roughly what you want, and you type a short prompt because typing the whole thing out feels like work. The answer comes back thin, so you patch it by hand, and tomorrow you do the same again. The trouble is not the model and not your idea. The trouble is that the way you handed the task in left most of what you knew behind. This subchapter is about that handoff: which input medium you choose, and how much of your real context survives the trip into the prompt.

How you hand a task in decides how much context survives

Start with one plain fact about your own mind. You can hold only a few pieces of information in your head at once, and typing a prompt asks that small to do three jobs at the same moment: figure out what you mean, choose the words, and trim what feels like too much. Trimming wins, because typing is slow and your patience is short, so the prompt that reaches the model is a compressed version of what you really hold in your head.

Speaking lifts one of those three jobs off the . When you talk a task through, you skip the editing step, so the asides, the constraints, and the details you would have trimmed come out on their own. You hand the model the unfiltered version of your thinking, and you let the model do the structuring afterward. For many people, on many tasks, that single change is the difference between a thin answer and a useful one.

Voice is not better everywhere, and the point is not to replace your keyboard. Some tasks want the precision that slow, deliberate typing gives you, where choosing the exact word is the work. The skill underneath is choosing, for this task, the medium that carries the most of what matters with the least friction. The rest of this subchapter builds that skill on one running example, then names the principle, then shows how far the same handoff goes.

Hand-drawn routing diagram comparing voice, text, image, and file inputs as different context channels routed through task fit into clearer context. — The best medium for a task is the one that loses the least context that this task depends on.

Thin prompts come from the cost of typing, not from a thin idea

Here is the running example for this whole subchapter: you need AI to write a project update for your team. Imagine you had to explain that same update to a colleague at the next desk instead of typing it. In three minutes of talking you would give them the project name, where it stands right now, the risk that is keeping you up, who needs to hear what, the office dynamics around it, and the tone that lands well with this particular group. None of that feels like extra effort when you say it out loud.

Now watch what happens when the same task goes in as text. The don't-apply version is the four-word prompt: 'write a project update.' Every one of those spoken details got trimmed at the keyboard, because spelling them out felt like too much to type, so the model receives the compressed husk of what you know and answers the husk. The do-apply version hands the model the three-minute version, the one you would have given the colleague, and lets the model structure it into an update. Same idea in your head, same model on the other end. What changed is how much of your context made it through the handoff.

The prose has built the felt idea: the thinness of the answer tracks the thinness of the input, and the input goes thin because typing is costly, not because you had little to say. Here is the plain name for what speaking gets past the keyboard.

Philosopher Michael Polanyi, who first described , put it in one line: 'We can know more than we can tell.' The details that decide whether your project update lands are exactly the ones you rarely type: which unstated preference matters to your manager, what the real deadline is behind the official one, which stakeholder needs to hear it first. You carry and use these every day, and you skip them at the keyboard because each one feels too small to spell out for a single prompt. Speaking surfaces them on its own, because saying a detail aloud costs almost nothing. That is the deeper reason talking a task through works: it turns the knowledge you live by into context the model can use right now.

The model wants information density, not polish

It is fair to worry that a rambling, repetitive memo will confuse the model. As of 2026 it usually does not. Current models are built to read past filler words, false starts, and circling back, and to pull the structure out of loose speech, though you still review the result. So the comparison is not polished-versus-sloppy. It is dense-versus-thin. A wandering 600-word memo that names the client, the deadline, the political weight, and three examples of what good looks like gives the model far more to work with than a clean 40-word prompt that compresses all of it into 'write a client update.'

This is why the repair is never to type more carefully. It is to stop paying the typing tax at all on the kinds of tasks where your context is rich, and to let speech carry the density instead.

The voice-memo workflow keeps getting rediscovered on its own

The same move keeps getting rediscovered, which is a sign it tracks something real about how input works. Product leads describe dictating a long background ramble into AI by voice memo and getting far stronger results than typing short prompts did: they talk through the problem for two or three minutes, the pain point, the constraints, the success criteria, the edge cases, and then ask AI to extract and structure the key points.

Others describe the same shape of workflow from a different starting point: record a voice memo on a walk, transcribe it, then ask AI to pull out the insights and structure them. The memo is not a recording for its own sake; it is a way to load context. It carries far more of what the speaker knows into the prompt than a quick typed line ever would, which is the whole point of choosing voice for a context-rich task.

Follow three steps to turn a spoken ramble into a structured output

You need no special tools to run this once. As of 2026, most current smartphones have a built-in voice recorder or recording app, and the major AI tools take direct voice input, an uploaded audio file, or a pasted transcript. The way you give AI input will keep changing under this, toward always-on voice, then glasses and earbuds that listen as you go, and eventually interfaces that read intent more directly still. The three steps below survive each of those moves because they describe what you are doing, getting more of your real context across with the least compression, not which button you press.

Talk the task through for one to three minutes. Do not plan or polish. Say the situation, the goal, the audience, the constraints, and anything else that surfaces. When you hear yourself add 'oh, and one more thing,' keep going, because that afterthought is usually the tacit detail the model most needs.
Hand the recording to AI. Upload the audio, speak directly in voice mode, or paste a transcript. As of 2026 all three work, and current models often handle loose, repetitive, wandering speech well, though you still review what comes back.
Ask AI to extract and structure it. A request like 'From this voice memo, name the key points, the constraints I mentioned, and the open questions, then draft the update I described' turns the unstructured ramble into a structured output you can edit.

You can collapse the recording step entirely by speaking straight into the chat. As of 2026, Claude, ChatGPT, and Gemini all take voice input in their mobile apps, so you can talk to the model directly instead of making a separate memo first. The effect on context is the same either way: speech carries more than typing. Use whichever feels more natural for the task in front of you.

Each medium keeps a different kind of context, so the fit decides

Voice is one tool among several, and naming the goal underneath it lets you choose well when voice is the wrong one. The durable principle under voice, text, images, and file attachments is reducing how much useful context you compress on the way into the prompt. Voice tends to reduce compression for many people because speaking skips the editing step. Screenshots, documents, transcripts, sketches, and worked examples each reduce a different compression for a different task. What you are choosing is the medium that keeps the most of what this task needs and loses the least.

Sort the media you have by the kind of context each one keeps, and the choice gets easier to make on the spot. The four families below cover the common cases; each one names what the medium keeps and what it gives up, so you can match it to the task in front of you.

Voice keeps your unfiltered thinking and gives up precision. Reach for it when the task turns on tacit context you would trim at the keyboard, like a project update loaded with audience and tone, and not when the exact wording is the work.
Text keeps precision and gives up spontaneity. Reach for it when the task hangs on a specific phrase, a careful definition, or a constraint that has to be stated exactly, and accept that you will leave some background unsaid.
A screenshot keeps visual state and gives up the surrounding story. Reach for it when what you are pointing at is on a screen, like an error, a layout, or a chart, where words would lose the exact arrangement the model needs to see.
A document or file keeps source detail and gives up your emphasis. Reach for it when the ground truth lives in a report, a spreadsheet, or a , and add a line of your own telling the model what in it to weigh most.

The habit is to weigh, before you type, which loss would hurt this task most, and pick the medium that avoids it. A weak output is not always a weak prompt. Sometimes it is the right words in the wrong medium, like a screenshot's problem squeezed into a sentence, or a tone-heavy update flattened into four typed words. You do not have to carry the whole diagnosis yourself: when an answer comes back thin, you can paste it back and ask the AI whether the task would have been better served by a different medium, and let its read sharpen yours. Choosing the medium is the move that loads richer context before you ever start refining the answer.

The prose above has built the idea from the running example outward. Here is the principle stated plainly enough to carry to any task.

Talk to AI now and feel the context a typed prompt would have dropped

Turn a voice memo into a structured deliverablePaid book · Claude extracts the key points, constraints, and open questions from your voice memo and produces the deliverable you described, then names the specific context a bare typed prompt would have dropped.

A between the ramble and the build lets you review intent before execution

The exercise above makes one output from one memo. You can add a single step in the middle that changes what voice input can reach. Instead of asking AI to produce the finished deliverable straight from your memo, you ask it first to produce a : a short, structured description of what the deliverable should contain, who it is for, and what counts as success. You read the spec, and only then do you hand it to a second AI session and say build it.

Engineers building software features describe doing this routinely. They open a voice transcription tool, talk through how a feature should work without editing themselves, hand the transcript to an AI agent with the instruction to write a , then open a second agent, point it at the spec, and say build it. The dictated idea becomes a structured spec, and the agent builds directly from the spec in a single clean pass.

When you read and correct the first, you can run the two passes fast and still catch a wrong turn early. Voice on its own gives you rich but loose context. The turns that context into a reviewable contract: here is what the output should contain, here is its structure, here are its constraints. You read the spec in half a minute. If it captures what you meant, the second pass can build with high fidelity. If it misses something, you fix the spec before any building happens, which is far cheaper than fixing a finished deliverable that went the wrong way. Pinning the intent down in the spec is what lets you turn the agent loose on the build without losing the thread.

This works for any deliverable with a definable shape, well beyond code. A client proposal has required sections, an audience, a tone, and success criteria. A project plan has milestones, dependencies, and outputs. A research brief has a question, sources, and a standard for what counts as verified. Talk through what the deliverable has to contain, ask AI to write the , read it, then ask AI to build from it. Two passes, each with one clear job.