T3E1 · Jun 21, 2026 · 00:16:24

GitHub Copilot Coding Agent: From Task to Reviewable Work

GitHub's cloud coding agent, opened up. Maya and Leo follow one assigned task through four rooms — Intake (the many doors a task enters through), the Workshop (a throwaway GitHub Actions environment where the agent plans, edits, and runs the real tests), the Hand-off (a draft pull request, never an auto-merge), and the Back-and-forth (where a PR review comment becomes the agent's next prompt). Along the way they show the harness made concrete in a settings page — default GitHub and Playwright MCP servers, custom-instruction files, a memory preview — and the guardrails that encode intended task size: no auto-merge, one repo per task, a hard 59-minute ceiling, and branch protections that block the agent until it's explicitly trusted. Running example: an API field rename across forty files, reviewable the whole way down.

Transcript

MayaPicture an issue sitting in your tracker. Nobody's touched it. You type one line — "fix the off-by-one in the date parser" — and you assign it to a name that isn't a person. Then you close your laptop and go to lunch.

LeoAnd when you come back?

MayaThere's a pull request waiting. A branch, a diff, a commit history, tests that ran, and a little log of how it got there.

MayaThe task didn't turn into an answer. It turned into a piece of reviewable engineering work.

LeoOkay.

MayaThat's the whole move of today's episode. The unit that comes back isn't a chat reply you copy-paste. It's a draft pull request you review like you'd review a teammate's.

LeoSo this is the GitHub Copilot coding agent — the cloud one, not the thing in my editor that finishes my lines.

MayaRight, and that distinction is the first thing to nail down. There's Copilot in the IDE, where you're in the loop keystroke by keystroke. And then there's this — an agent that "works autonomously in a GitHub Actions-powered environment." You hand it a task and walk away.

LeoLast time, in the topic map, you framed the whole season around the idea that the agent is a system, not a model — model, harness, tools, the trace. Today's the first time we open up an actual shipped product and see what those parts look like when somebody's selling it.

MayaExactly the turn I wanted. The overview was the blueprint. This is the building. And the reason GitHub's a good first building is that every part of that blueprint is visible — you can literally watch the harness do its work in the commit log.

LeoSo walk me through what actually happens. I assign the issue. Then what?

MayaThere's a shape to it, and I want to give the stages names so they're easy to hold onto. Think of four rooms the work passes through. Call them the Intake, the Workshop, the Hand-off, and the Back-and-forth.

LeoFour rooms. Go.

MayaIntake is where the task enters. And the thing worth knowing is how many doors lead into that room. You can assign from a GitHub issue. From the agents panel on the website. From an at-mention in a pull request comment — you literally type at-copilot. From the mobile app, the command line, even from Slack or Jira or Linear if your team's wired up that way.

LeoThat's a lot of doors. Why does that matter? A door's a door.

MayaBecause it tells you what they think the agent is. It's not a feature inside one tool. It's a worker you can summon from wherever the work already lives. The task doesn't have to come to a special place — the agent goes to where you noticed the problem.

LeoHuh. Okay, that reframes it. It's less "open the AI app" and more "tag the AI on the ticket."

MayaThat's the spirit of it. Now — Workshop. This is where I think the interesting engineering is. When the agent picks up the task, it doesn't start typing code. It spins up "its own ephemeral development environment, powered by GitHub Actions."

LeoEphemeral meaning it's thrown away after.

MayaThrown away after. A fresh, sealed workspace that exists for this one task and then evaporates. Inside it, the agent can "explore your code, make changes, execute automated tests and linters." It's a real machine with your repo on it, not a model imagining what your repo probably looks like.

LeoOh, that's the part that actually matters. Because the failure mode of the dumb version is the model hallucinating a function that isn't there. If it can run the tests, the repo punches back.

MayaThe repo punches back. I'm stealing that. That feedback loop — edit, run, observe, fix — is the entire reason this is an agent and not a fancy autocomplete. And the first thing it does in that room isn't even code. It "research[es] the repository and create[s] implementation plans" before it changes a line.

LeoPlan first, then edit. So if I look in on it mid-run—

Maya—you see the plan, and then you see commits landing against it. Because the next thing it does is "automate branch creation, commit message writing, and pushing." Every move it makes is a commit. Nothing happens off the books.

LeoWait, every step's a commit?

MayaEvery step. The docs put it plainly — "every step happening in a commit and being viewable in logs." That's not an accident. That's the design choosing transparency over magic.

LeoOkay, that's the thing I want to sit on, because that's the difference between a demo and a tool. I don't trust a black box that hands me a thousand-line diff. I might trust one where I can scroll back and see, here's where it decided to touch the auth module, here's the test it ran, here's where it backed out a bad idea.

MayaAnd that scroll-back is the trace — the same trace we said the whole topic revolves around. GitHub's version of it is the commit history plus the session log. You can watch it work in real time, and if you started the task from chat, you can "ask follow-up questions about progress in the same conversation" while it's still running.

LeoSo it's narrating its own work as it goes.

MayaAs it goes. Which brings us to the Hand-off — room three. When it's done, or when you tell it it's ready, it opens a pull request. And here's the detail I'd underline twice: it opens a draft.

LeoA draft. Not a merge.

MayaNot a merge. There is "no automatic merging." The agent cannot ship to your main branch on its own. It produces the candidate; a human decides. The product is reviewable work handed to a person, full stop.

LeoOkay, but now I have to be the annoying empiricist for a second, because this is where I get itchy.

MayaPlease. Be itchy.

Leo"It opens a pull request you review." That sounds great in the demo. But somebody still has to review it. If I'm now reviewing a thousand lines of code a machine wrote at three in the morning, did you save me work, or did you just move the work from writing to reading? Reading bad code is sometimes harder than writing good code.

MayaThat's the real objection, and I don't want to wave it away. So let me give you the strongest version of the case, and then the honest limit.

LeoGo.

MayaThe strongest case is that review was already the bottleneck. On a healthy team, code gets read before it merges no matter who wrote it. The agent isn't adding a review step — it's filling the part that was always going to be reviewed anyway, and it's filling it with something reviewable: small commits, a plan you can check against, tests it already ran. The claim is the diff arrives pre-shaped for review.

LeoMm. Pre-shaped.

MayaAnd there's a triage logic to it. You point it at the work where reading is cheap and writing is tedious — the off-by-one, the test coverage gap, the boilerplate migration, the "rename this across forty files." Those are tasks where a human review is fast precisely because the change is mechanical.

LeoFine. The latency argument survives — for the boring tasks, reading the diff is genuinely faster than writing it.

LeoBut I don't think the trust argument survives for the hard ones. If the task is subtle, I'm back to reading a thousand lines from a stranger, and the fact that it's well-commented doesn't make the logic right.

MayaI'll concede that completely. This is not the tool you point at the gnarly distributed-systems race condition and walk away. The docs even hand us the boundaries. There's a hard sixty-minute ceiling — "a maximum execution time of fifty-nine minutes," and it "cannot be extended." It works on one branch at a time, in one repository. These are guardrails that quietly tell you the intended size of a task.

LeoThat fifty-nine-minute wall is actually a useful tell. It's saying: this is for the bounded thing, not the open-ended saga.

MayaIt's a scope signal disguised as a timeout. And that brings us to the fourth room, which is the one I think people underrate — the Back-and-forth. Because the first draft is rarely the last word.

LeoThis is the review-comment thing.

MayaThis is the review-comment thing. You read the draft, and instead of fixing it yourself, you leave a comment on the pull request — the normal way, the same comment box you'd use for a human colleague. "This breaks the null case." And the agent picks that comment up and iterates. It goes back into the workshop, makes the change, pushes another commit.

LeoSo the review isn't the end of the conversation. The review is the next prompt.

MayaThat's the line. The review is the next prompt. The pull request becomes the workspace where you and the agent converge, comment by comment, until it's mergeable — or until you give up and take it over yourself.

LeoOkay, I actually like that, because it matches how I already work. I don't write a perfect PR. I write one, my colleague picks at it, I push fixes. This just slots the agent into a ritual I already trust.

MayaAnd that's the quiet genius of building it on pull requests instead of inventing a new surface. They didn't make you learn a new dance. They put the agent inside the dance you already know.

LeoLet me push on the safety side, though, because "autonomous agent with commit access" is a sentence that should make a security person sweat.

MayaIt should, and they thought about it — though the guardrails have sharp edges. The load-bearing one is that it can't merge — the human approval gate. On top of that, it's boxed into "the repository specified when you start a task" — it can't wander into your other repos. And your existing branch protections still bite.

LeoSay more on that.

MayaSo if you've got a rule that "only allows specific commit authors," that rule will actually block the agent from creating or updating the pull request — because the agent is an author your rule doesn't recognize. You have to deliberately "add Copilot as a bypass actor" for it to work.

LeoWait — so the friction is a feature. The thing has to be explicitly let in. It doesn't get to assume it's trusted.

MayaDefault-deny, opt-in. You make a conscious choice to give this thing a key. And there's a matching honesty in the limitations — it "doesn't account for content exclusions" an admin set up, so the docs are upfront that the guardrails aren't perfectly airtight. That's the kind of caveat I want to see, actually. It's telling you where not to point it.

LeoThat's more reassuring than a glossy "fully secure" claim, honestly.

MayaNow — the part that connects straight back to the harness idea from the overview. The agent isn't using raw cleverness alone. It comes wired with tools. By default it has "the GitHub MCP server and Playwright MCP server."

LeoSpell those out for me. M-C-P first.

MayaM-C-P — the Model Context Protocol — is basically a standard plug. It's how you give a model a clean, structured connection to an outside tool or data source instead of hoping it guesses. The GitHub one over that plug lets the agent actually reach into your repo, issues, pull requests as real objects. And Playwright is a browser-automation tool — it means the agent can drive a real web browser. It can load a page and check that the button it just wired up actually does something.

LeoOh — so it can verify its own front-end change by clicking the button. That's the repo punching back, but for the UI.

MayaFor the UI. And admins can plug in more — other MCP servers for "different data sources and tools." So the agent's capability isn't fixed. You can widen what it can see and do by changing its harness — which is exactly the claim we made in the abstract last time, now sitting in a settings page.

LeoThat's the overview's whole thesis made concrete. Hold the model still, change the tools, change what it can pull off.

MayaAnd you can shape it further with two more knobs. There's custom instructions — "short, natural-language statements that you write and store as one or more files in a repository." Standing orders. "Always use our logging library." "Never touch the generated files."

LeoSo a house style the agent reads before it starts.

MayaA house style it reads before it starts. And then there's a memory feature in preview, where the agent can "store useful details it has worked out for itself about a repository." So the next time you point it at your codebase, it isn't starting from zero — it remembers the lay of the land.

LeoThat's the memory panel from the overview, sitting right there in the product.

MayaOne concrete example to tie the rooms together, and it's the kind of task this thing's genuinely good at. Say your API renamed a field — "user_id" became "account_id" — and it's referenced in forty files. Tedious, mechanical, easy to get wrong by hand because you'll miss two.

LeoAnd easy to review, because I know exactly what right looks like.

MayaYou assign it. Intake: it reads the issue. Workshop: fresh environment, it greps every call site, makes the edits, runs the test suite, watches three tests fail because of a spot it missed, fixes them. Hand-off: draft pull request, forty files, green tests, a plan you can skim in thirty seconds. Back-and-forth: you notice it missed a string in a config comment, you leave one comment, it pushes one more commit.

LeoAnd you merge.

MayaAnd you merge — your finger on the button, not its. The work was reviewable the whole way down. That's the product in one sentence: it turns a task into a pull request, and keeps a human holding the pen on what ships.

LeoI came in skeptical and I'll land here — the honest version of the pitch isn't "it replaces the engineer." It's "it does the bounded, reviewable middle, and hands the judgment back to you."

MayaAnd it shows its work the whole time, so the judgment has something to chew on. The trace isn't a nice-to-have. It's what makes the trust possible at all.

LeoSo here's what I'm chewing on. If the agent's real product is reviewable work — a diff plus the trail of how it got there — then the bottleneck moves from writing code to reviewing it well.

MayaSo the question I'd leave you with: if you handed this agent the next bounded task on your list, would your team's review process actually catch a subtle mistake buried in a clean-looking pull request — or have you been trusting that the person who wrote the code already understood it?

Source material

← Back to Agentic Coding Capability: From Coding Models to Coding Agents