Transcript
MayaLast time we separated code generation from agentic work: task solving begins when the system has to discover context, act, verify, and hand off a result.
LeoToday we name the layer that makes that work possible: the Agent-Computer Interface, often shortened to ACI.
MayaPut a skilled engineer in front of a broken terminal, a noisy search tool, and a file viewer that hides line numbers, and suddenly that engineer looks less skilled.
LeoSo the tool surface changes the apparent intelligence.
MayaExactly. The ACI is that surface for a model.
LeoPlain language version: the ACI is how the model sees and touches the computer.
MayaYes. It includes search commands, file viewers, editors, terminals, validators, error formats, history, and feedback messages. It is the model's workbench.
LeoLet's start with the first landmark: the Tool Rack.
MayaThe Tool Rack is the set of actions the agent can take. Search the repo. Open a file. Edit a range. Run a test. Read output. Maybe create a branch or comment on a pull request. A good rack makes useful actions easy and risky actions visible.
LeoIn the express-checkout bug, the rack determines whether the agent can search for "express checkout", inspect the validator, and run the relevant tests.
MayaExactly. The SWE-agent paper and documentation are the key source here. They argue that language-model agents are a new kind of computer user, and that interfaces designed for humans are not automatically best for agents.
LeoThat is a subtle idea. The agent is not using a graphical IDE like I do. It needs a text interface shaped around its strengths and weaknesses.
MayaRight. The second landmark is the Context Window. This is not just the model's token limit. It is the way the interface slices the repository into what the agent can inspect at each step.
LeoToo little context and the agent guesses. Too much context and it drowns.
MayaExactly. SWE-agent's ACI documentation describes a special file viewer rather than dumping whole files, and a repository search command that lists matching files succinctly. Those choices push the agent toward manageable context.
LeoSo context selection is behavior design.
MayaYes. The third landmark is the Edit Safety Catch. Editing is where agent mistakes become real changes. An interface can require structured patches, block syntactically invalid edits, or make the agent inspect a diff before continuing.
LeoSWE-agent used a linter around edit commands, right?
MayaYes. Its docs describe running a linter when an edit command is issued and refusing an edit if the code is not syntactically correct. The point is not that a linter proves correctness. It catches a class of avoidable mistakes at the moment they are introduced.
LeoThat is a useful distinction. It is a catch, not a proof.
MayaExactly. The fourth landmark is the Feedback Light. After an action, the agent needs a clear signal. Did the command fail? Did it pass silently? Was there no output because everything was fine, or because the tool broke?
LeoI remember the SWE-agent docs mentioning an explicit message when a command succeeds with no output.
MayaYes. "Your command ran successfully and did not produce any output." That sounds small, but it prevents the model from inventing meaning in silence.
LeoIn our checkout example, a focused test might pass with no output. A clear feedback light tells the agent not to chase a phantom failure.
MayaExactly. Interface design turns environmental facts into observations the model can use.
LeoSo in the express-checkout case, the same underlying bug can look easy or impossible depending on whether the interface makes the right validator, test command, and command result visible.
LeoWhere do experts disagree about ACI design?
MayaOne side favors richer, more expressive tools. Their strongest argument is leverage: if agents can inspect, edit, test, browse, and query structured state, they can handle complex work with fewer blind spots.
LeoAnd the other side?
MayaThe other side favors narrow, predictable tools. Their strongest argument is reliability. Every new tool adds failure modes, prompt overhead, security questions, and opportunities for the agent to do something unexpected.
LeoSo the best interface is not the biggest one. It is the one that makes good work easier than bad work.
MayaExactly. ACI design is an incentive system. If search returns overwhelming snippets, the agent may anchor on the wrong line. If edits are too free-form, it may corrupt files. If tests are hard to run, it may skip verification. If failures are vague, it may thrash.
LeoThat makes the harness part of capability, not just packaging around the model.
MayaYes. In the SWE-agent paper, the authors report large performance differences when the agent uses a tuned interface. The important lesson for this series is broader than a single score: a coding agent is the combination of model, tools, feedback, and task environment.
LeoWhat should a builder audit in their own ACI?
MayaStart with the tool rack. Are the right actions available? Then the context window. Does the agent see useful slices rather than floods? Then the edit safety catch. Are obvious malformed changes blocked? Then the feedback light. Are outcomes explicit enough for the agent to revise?
LeoAnd later, the trace ledger can show whether the interface helped or hurt.
MayaExactly. If agents repeatedly open giant files, miss obvious search terms, or misread terminal output, that is not only a model problem. It may be an interface problem.
LeoThe next episode turns that work process into data.
MayaBefore that, there is a useful builder test: compare the agent's intended action with the interface's default action.
LeoMeaning?
MayaIf the agent wants to inspect context, does the interface default to a focused file window or a giant dump? If the agent wants to edit, does the interface default to a reviewable patch or opaque file replacement? If the agent wants to verify, does the interface surface the right test command or leave it to guess?
LeoThe defaults become steering.
MayaExactly. ACI design is steering without pretending to solve the task for the model. The interface can make disciplined behavior cheaper: search before edit, small patch before broad rewrite, focused test before victory claim.
LeoAnd it can make risky behavior harder.
MayaYes. It can require confirmation for destructive commands, limit edit scope, record diffs, or flag suspicious operations. That is especially important because coding agents work in environments where a mistake can change files, leak data, or break builds.
LeoSo the interface has a safety role as well as a productivity role.
MayaExactly. Tool design, feedback design, and permission design are all part of the ACI. A builder who ignores them is not evaluating a pure model. They are evaluating a model inside a poorly specified workplace.
LeoThat reframes failures. Instead of asking only "why did the model do that?", we also ask "what did the interface make easy?"
MayaYes. And once we log those actions, we can diagnose the pattern rather than argue from a single anecdote.
LeoThat is the bridge to trajectories.
MayaExactly. The interface leaves fingerprints on the trajectory. A well-designed ACI produces cleaner evidence because actions are structured, outcomes are explicit, and risky operations are visible.
LeoThe trace becomes easier to trust because the workbench made the run legible.
MayaIf you were designing a coding agent's workbench, which tool would you constrain most tightly because a mistake there would create the largest downstream risk?
Source material
← Back to Agentic Coding Capability: From Coding Models to Coding Agents