Designing for the gap between conversation and delegation

While setting up a project recently, I wrote a very detailed document, everything I wanted, where, how, plus a scatter of considerations, ideas and variants. A brain dump. I cleaned it up, and then, instead of handing it straight to an agent to act on, I did something slightly different: I asked Claude what it had understood from the document, and how its sibling tool, Claude Code, would have read it.
The response was instructive. Asked to analyse the brief rather than execute it, the model gave me a glimpse of its own reasoning, and if your tool exposes a “reasoning” or “thinking” view, I would strongly recommend expanding it. It is the most direct window you get onto how your words are being interpreted.
What I saw reminded me of something I had let myself forget. I had been using Claude as a brainstorming companion for so long that I had stopped noticing the basics: however human these tools feel, however fluently they converse, humans and machines work in deeply different ways.
The evolution of AI interaction has reached a point where the very fluency that makes these tools accessible has become a quiet source of professional friction. To test whether this was just my own experience, I observed seven people working with their AI tools in their real contexts. This is a deliberate first pass, not a finished study, enough to see whether a pattern is there, and, I think, enough to show that it is. What it needs now is scale.
But even at this size, one tension surfaced again and again: the shift from exploratory conversation to precise delegation.
Direct observation
Watching people work, three broad ways of addressing an AI tool emerged.
The collaborative treat the AI as they would a human colleague: gentle, conversational, exploratory. They “talk in writing”, using the act of drafting to surface options and check their own reasoning as they go.
The commanding are blunt and direct, and seem keen to mark the difference: this is a machine, and they want the exchange to feel like one. Yet, and this was the surprise, they frequently forget to supply context, assuming the tool already shares it. Their directness does not protect them; their results were no better than anyone else’s, only differently flawed.
The over-explainers supply far more than the task requires. Paradoxically, the extra material tends to increase ambiguity rather than reduce it, burying the actual instruction in the surrounding thought.
One finding cut across all three groups. The majority accepted the AI’s suggestions even when those suggestions had drifted from their original intent. For exploratory work, that is an advantage: a good suggestion is a good suggestion. For delegation, it is a problem, and I will come back to it.
The trap of “talking in writing”
Talking in writing is valuable in human collaboration: the meandering is the thinking. It becomes a problem the moment that same meandering is handed to an agent as a brief.
In my own case, the failure was a missing distinction between a thinking document, which explores, and a brief, which decides. Blended into one, they leave the agent guessing which sentences are open questions and which are firm commitments.
A practical heuristic helps here: before sending a brief, scan it for question marks and modal verbs, “could”, “would”, “maybe”, “perhaps”. Each one usually marks an unresolved decision. Move them into a separate “open questions” section, and what remains is an actual brief.
The dissonance of fluency
This friction has a cause. Conversational interfaces carry social signals like turn-taking, acknowledgement, the rhythm of reply, and those signals quietly persuade us that there is a shared mental model on the other side. There usually is not.
The deeper reason is not really a theory; it is economy. When a system is hard to interpret, we do not stop to build a model of how it works. We reach for the nearest familiar experience and assume it transfers. It usually works like that, so this will probably work that way. And for most of the last decade, the nearest familiar experience has been chat with peers: Slack, WhatsApp, Teams, iMessage. That history has trained us to treat text conversation as low-stakes, repairable and tolerant of ambiguity. If a colleague misreads you, they ask; you clarify; the cost is one more message. Even that is more fragile than it looks: we stack emojis to clarify tone, and we are usually writing to people with whom we already share the broader context, yet misunderstandings happen.
This leaves interface designers with a genuine bind. Make the interface less conversational, and you lose the ease and pleasure of the exchange, which matters, both for adoption and for the quality of thinking the tool can support. Make it more conversational, and you invite the dissonance: a surface that calls for a register the underlying machinery cannot quite honour. Most current products lean towards fluency and leave users to work out for themselves where the seams are.
The sharpest point of failure is ambiguity. Humans tolerate ambiguity well and resolve it socially by asking. Agents tend to resolve it silently, by committing to a single interpretation, often not the most pertinent one. It is worth being precise here: this is not the same as a hallucination. Inventing a source that does not exist is a hallucination. Quietly choosing one reading of an under-specified instruction is a misinterpretation, and many failures blamed on the model are really failures of the brief to make the decision for it.
That suggests a design principle: the more agentic the task, the more the interface should resist conversational fluency.
Underneath all of this is a category shift in what words are for. In conversation, ambiguity is a shared resource, something two parties hold between them and resolve together. In delegation, it becomes a unilateral commitment: whatever the agent decides, it acts on, often before you can intervene. The register has not changed. The stakes underneath it have.
A small example from my own work. A script I built scrapes job postings and scores them; the UI flags any new results. After updating my preferences and re-scoring everything, the freshly-promoted jobs were not flagged. I asked Claude to fix it, and the answer pointed back at me: the “new” badge fires the first time a job is discovered, not when its score crosses the threshold. I had said “new entries”. I had meant “new to me”. I had taken the shared context for granted, and the system committed to the more plausible alternative.
The other half of the gap: deference
If the agent over-commits, the user, just as quietly, under-resists.
This was the cross-cutting finding from my observation: faced with a confident suggestion, most people took it, even when it diverged from what they had set out to do. The failure is two-sided. The machine resolves ambiguity by guessing; the human ratifies the guess by deferring.
Several things make deference the path of least resistance. Re-evaluating a suggestion against your original intent is real cognitive work, and accepting it is free. The fluent, assured register of the response makes it feel authoritative, whether or not it is. And once the agent has produced something concrete, that artefact becomes an anchor: it is easier to accept what is already on the page than to reconstruct what you actually wanted.
For exploratory work, this is harmless, even generative. For delegation, it is a slow corruption of intent: not one visible error, but a gradual drift away from the goal, ratified one small acceptance at a time. It is the kind of failure that does not show up in completion rates or satisfaction scores (the task finished, the user did not object), which is precisely what makes it worth taking seriously.
When the conversation is the only interface
This problem becomes acute in voice-first and ambient systems, where the conversation is not a layer within the interface; it is the whole interface.
In a text UI, the shift into “delegation mode” is at least partly carried by visual affordances: confirmation dialogues, plan previews, diff views, an undo button, and a sentence set in bold. Strip those away, and every word becomes load-bearing. The system has to reconstruct the signal that says “you are now committing to something” out of pacing, confirmation and repetition alone.
This is genuinely hard because the moment you build that signal back in, you are adding friction to the very fluency that made the medium feel natural. What people love about voice and chat is precisely that it does not feel like filling in a form. And yet some tasks need to feel a little like filling in a form — otherwise the user cannot tell the moment their words stopped being speculation and became commitments.
New disciplines for a new literacy
If the problem is the felt difference between exploratory and committed speech, then the disciplines best equipped to help are not the obvious ones. They are the fields that have always treated how something is said as carrying as much meaning as what is said:
• Theatre directors, who know how a pause or a change of rhythm can rewrite the meaning of a line.
• Radio drama producers, who convey state and stakes through timing alone, with no picture to lean on.
• Simultaneous interpreters, who live full-time in the gap between colloquial source material and structured target delivery.
• Legal drafters and liturgists, who treat language as performative, where a specific sequence of words does not describe an action but is the action.
Design, historically, has worked to make the visual and structural elements of an interface legible, to make the machine understandable to humans. That problem has not disappeared, but it has shifted. Now that machines appear, to a degree, to understand us, the task is to make the machine’s response visible and controllable. And the task after that (making the temporal and prosodic elements of an exchange legible) sits much closer to the performing arts than to design as we have practised it.
Further exploration
Taking this further calls for a reportage approach: grounding these observations through sustained participant observation. Watching a genuine mix of users (experts, sceptics, resisters) navigate these register shifts in their real working contexts would let us build a proper typology of coping strategies, rather than relying on a panel of seven.
So this piece is partly an invitation. If you work with AI tools and would let me observe a real working session, I would like to hear from you. And if you are inside an organisation for which this is not idle curiosity, anyone shipping agentic products is, whether they have named the problem yet or not, there is a proper study here worth supporting.
There is also a deeper anthropological thread worth pulling. Many users seem to treat AI capability as something like mana, an immanent, quasi-magical force, accepted and drawn upon without being interrogated. It bears less directly on day-to-day product decisions than the rest of this piece, but as a lens on how people relate to these tools, I find it hard to put down.
Ultimately, closing this gap is not about writing better prompts. It is about inventing a new literacy for a moment in which conversation has become the primary interface for action. At least until the next technology arrives and makes conversational interfaces obsolete in their turn.
A note on method
This essay was developed across a series of conversations with Claude, Anthropic’s AI assistant, used here as a thinking partner: a companion for exploration and a way to pressure-test the argument, never a source of its conclusions. Those conclusions, and any errors among them, remain mine. That an essay about the distance between conversation and delegation should itself have grown out of sustained conversation is an irony I have chosen to keep rather than hide, since it delights me.
References
Marcel Mauss, A General Theory of Magic, 1902
Daniel Kahneman, Thinking, Fast and Slow, 2011
Dan Ariely, Predictably Irrational: The Hidden Forces That Shape Our Decisions, 2008
Appendix: Strongly Suggested Reading
The article above is based on observation, then, digging around, I found some interesting papers and articles on related topics, which are really worth reading.
Emrecan Gulay, Eleonora Picco, Enrico Glerean, Corinna Coupette, Relational Dissonance in Human-AI Interactions: The Case of Knowledge Work, Aalto University, Greater Helsinki, Finland, (2025)
Mengke Wu, Mike Yao, After the Interface: Relocating Human Agency in the Age of Conversational AI, University of Illinois Urbana-Champaign Institute of Communications Research, USA, 2026
Ivana Bilic, Trust at First Prompt: The New Design Challenge of AI Interfaces, Honeycomb.io/blog, 2025
The register shift was originally published in UX Collective on Medium, where people are continuing the conversation by highlighting and responding to this story.
