Transcript
MayaLast episode we built the Topic 1 map: agentic coding is repository work through tools, feedback, trajectories, and review.
LeoToday we narrow that map to the shift from code generation to agentic work.
MayaImagine asking for a function that normalizes phone numbers, then imagine asking an assistant to fix why phone-number login fails for some international users in a live repository.
LeoSame neighborhood, completely different job.
MayaExactly. That contrast is the shift.
LeoPlain language first. Code generation means the model writes code from a prompt. Agentic work means the model has to move through a software task.
MayaRight. It must inspect context, choose a plan, use tools, edit files, run checks, read feedback, and produce something reviewable.
LeoSo the first landmark is not a model architecture. It is the Snippet Desk.
MayaThe Snippet Desk is the simple setting. The user gives a compact request, the relevant context is mostly in the prompt, and success can often be judged by reading the answer. "Write a Python function that parses a date." "Show me a SQL query." "Generate a React component."
LeoThe model can still be wrong, but the workspace is tiny.
MayaYes. The second landmark is the Repository Floor. Now the model is standing inside a project with conventions, hidden dependencies, old decisions, and tests. A bug report may be vague. The relevant code may be spread across several files. The obvious edit may be wrong because another layer owns the rule.
LeoLet's reuse the express-checkout bug. At the snippet desk, the answer is a validator. On the repository floor, the answer might be "do not add a validator here because the backend schema is authoritative."
MayaExactly. Repository-level work is not just harder because there are more tokens. It is harder because the task is situated. The agent has to discover what matters.
LeoThat makes the third landmark the Context Hunt.
MayaThe Context Hunt is the search process. A coding agent may grep for "express", inspect route handlers, open tests, read a schema file, and compare a regular checkout path with the express path. The quality of that hunt changes the quality of the patch.
LeoThis is where a weak agent can sound confident while editing the wrong file.
MayaYes. And a strong agent may do something that looks slow but is actually disciplined: reproduce, inspect, patch narrowly, then verify.
LeoWhere do the primary sources show this shift?
MayaSWE-bench is a major anchor. The benchmark gives systems real GitHub issues and asks them to modify repositories so fail-to-pass tests succeed. That changes the unit of evaluation from "did the code answer look plausible" to "did the system resolve an issue in context."
LeoSo SWE-bench moves us from the snippet desk to the repository floor.
MayaCorrect. Aider adds another angle. Its benchmark writeups focus on whether a model can deliver edits in a format the tool can apply, run tests, and in some settings respond to test failures. That is not full autonomous engineering, but it highlights an important boundary: code quality and edit delivery are different skills.
LeoA model might know the fix but fail to package the edit.
MayaOr it might package the edit perfectly but misunderstand the test feedback. The fourth landmark is the Verification Gate. In agentic coding, work is not done when the model says it is done. The patch has to face a gate: tests, linters, hidden checks, review comments, or product-specific validation.
LeoThe gate turns "I think this is right" into "the environment pushed back."
MayaExactly. But the gate can be weak. Passing visible tests does not prove the patch is maintainable, secure, minimal, or aligned with the user's intent.
LeoThat is the trade-off. Tests make the task concrete, but they can also narrow what the agent optimizes for.
MayaYes. If the only reward is a green test, a system may learn to satisfy the test without preserving design quality. That is why later topics treat evaluation as multidimensional.
LeoGive me the work loop in ordinary words.
MayaThe agent receives a task. It builds a map of the repo. It chooses a likely location. It edits with the repo's style in mind. It runs a focused check. It reads the feedback. If the check fails, it updates the hypothesis. If the check passes, it still considers review readiness.
LeoThat sounds like engineering judgment, not just language prediction.
MayaIt is engineering judgment expressed through tool use. The model may provide the reasoning, but the harness provides the hands, eyes, and sensors.
LeoWhere do experts disagree here?
MayaOne camp says we should focus on better task-solving agents that can run the whole loop. Their strongest argument is that real software work rarely arrives as isolated snippets. The other camp says we should preserve smaller specialized tools because autonomy can hide mistakes. Their strongest argument is control: developers need predictable edit scopes, inspectable diffs, and clear handoff points.
LeoSo "agentic" is not automatically better. It is better when the task actually requires situated work.
MayaExactly. For a small utility function, a direct code answer may be the right tool. For a repository bug with uncertain ownership, an agentic workflow has a chance to find the real boundary.
LeoWhat should listeners watch for when they see agent demos?
MayaWatch the transition moments. Did the agent search before editing? Did it distinguish likely files from authoritative files? Did it run the right checks? Did it respond to feedback or repeat itself? Did it produce a clean diff and an explanation a reviewer can use?
LeoThe final patch matters, but the work pattern tells us whether the system is learning the craft.
MayaExactly. Think of code generation as producing an artifact. Think of agentic coding as producing an artifact through an observable work process.
LeoAnd the next episode goes deeper into the interface that makes that process possible.
MayaBefore we move there, notice a practical boundary. Not every coding request deserves an agent. If you know the exact file and the exact change, a direct edit may be faster and safer.
LeoSo agentic work has overhead.
MayaIt does. The agent has to inspect, plan, act, and verify. That overhead is worth paying when the task is ambiguous, cross-file, test-driven, or review-sensitive.
LeoFor the phone-number login bug, the overhead buys discovery.
MayaExactly. It lets the system find whether the issue is parsing, validation, normalization, database storage, or user-interface messaging. A snippet answer cannot do that discovery unless the user already did it.
LeoAnd the express-checkout bug has the same shape. The useful answer is not merely a validator; it is discovering which layer owns the rule and proving the fix there.
LeoThat changes how teams should write tasks.
MayaYes. For code generation, the best prompt may be a precise request for an artifact. For agentic work, the best task includes intent, constraints, acceptance criteria, and permission boundaries. "Fix international phone login" is weaker than "Users with plus-prefixed country codes cannot log in; preserve existing local-number behavior; add or update tests; keep changes scoped to authentication."
LeoThat is still not telling the agent the answer. It is telling the agent the job.
MayaExactly. A good task gives the agent enough shape to work without pretending the human already knows the solution.
LeoSo the shift is also a shift in human collaboration.
MayaRight. We stop treating the model as a code vending machine and start treating the system as a bounded worker whose process must be inspectable.
LeoThat makes the next question obvious: what kind of workbench does that worker need?
MayaWhen you ask an AI system for help with code, are you asking for a code artifact, an edit inside a known file, or a full task-solving loop through a repository?
Source material
← Back to Agentic Coding Capability: From Coding Models to Coding Agents