The New Engineering Workflow: Agents, Reviews, and Shipping Faster
When coding agents enter a team, the unit of work shifts from lines typed to changes reviewed, and the real engineering effort moves with it.
A coding agent can now produce a 600-line pull request in four minutes. It compiles. The tests pass. The description is articulate, the commit history is tidy, and the whole thing is plausibly correct. So who reads it?
That question is the entire story. For most of software’s history, the scarce resource was the act of writing code: turning intent into working syntax was slow, and everything in our process — pairing, code ownership, “10x engineer” mythology — assumed authoring was the bottleneck. Agents quietly removed that assumption. The constraint didn’t disappear; it moved. The unit of work is no longer “lines typed,” it’s “changes reviewed and trusted enough to merge.” Once you internalize that shift, a lot of your team’s habits stop making sense, and a few you’ve neglected suddenly become load-bearing.
From authoring to directing
The day-to-day work of a senior engineer using agents looks less like typing and more like running a small team of fast, literal, slightly overconfident juniors. You’re not writing the function; you’re specifying it, watching the attempt, and deciding whether the attempt is right. The skills that matter shift accordingly: decomposing a problem into verifiable pieces, writing a crisp acceptance criterion, recognizing a wrong approach in the first 30 seconds of a diff rather than the last.
This is genuinely good news for people who were already strong at design and review, and a rude surprise for people whose value was raw output speed. An agent will out-type anyone. What it won’t do — reliably — is know which of three plausible designs survives contact with next quarter’s requirements, or notice that the “clean” refactor just broke an invariant that lives three services away and exists nowhere in the code it was shown.
The job was never typing. Agents just made that obvious, and a little embarrassing.
The mental model that works for us: treat the agent as a force multiplier on your judgment, not a replacement for it. If your judgment about a change is fuzzy, the agent multiplies the fuzz. It will confidently build exactly the wrong thing, beautifully, and hand it to you with a paragraph explaining why it’s correct.
The review bottleneck is the new constraint
Here is the uncomfortable arithmetic. If generation gets 5x faster and review stays the same speed, you have not made the team 5x faster. You’ve made a bigger pile of work in front of the same reviewer. Throughput is set by the slowest stage, and review is now reliably the slowest stage.
Worse, review quality degrades under volume in a way that’s hard to see. A reviewer who approves twelve agent PRs in an afternoon is not reviewing the twelfth the way they reviewed the first. They’re pattern-matching: tests green, diff looks like the others, LGTM. That’s not review, it’s a rubber stamp with extra steps, and it’s exactly how a subtly wrong change ships with a human name on the approval.
So the leverage question for an engineering leader is no longer “how do we generate more code.” You have that. It’s “how do we make changes cheap to review and safe to trust.” Everything below is in service of that.
Make changes cheap to review
The single highest-impact move is shrinking the diff. A 600-line agent PR is a review liability regardless of who or what wrote it; a 600-line agent PR that nobody truly read is a future incident. Push back on size the same way you would with a human author, except now you have a tool that’s happy to redo the work as three small PRs instead of one large one — so there’s no excuse not to.
- One PR, one intent. If the description needs the word “also,” split it.
- Acceptance criteria written before the agent starts, not reverse-engineered from the diff.
- A diff a competent reviewer can hold in their head in one sitting.
- The reasoning (“why this approach”) in the PR body, so review is about the decision, not just the syntax.
Tests and CI are the trust layer
If review is the bottleneck, automated verification is what lets you safely review less of each change by hand. This is the part teams under-invest in and then regret.
When a human writes code, the tests are evidence the author understood the problem. When an agent writes code, the tests are the only thing standing between “it claims to work” and “it works” — and the agent will cheerfully write tests that assert the bug it just introduced. So the trustworthy pattern is to keep humans authoring the contract and let the agent satisfy it. Specify behavior first, ideally as a test or a precise acceptance criterion, then let the agent make it pass:
# Human writes the contract:
def test_refund_rejects_amount_over_original_charge():
order = make_order(charge_cents=5000)
with pytest.raises(RefundExceedsCharge):
refund(order, amount_cents=5001)
# Agent makes it pass. The test is the thing you actually review.
Your CI pipeline becomes the merge gate that matters more than ever: type checks, a real test suite, linting, and ideally a layer the agent can’t trivially satisfy by gaming it — integration tests, property-based tests, contract tests against real schemas. A repo with strong CI lets you trust agent output proportionally. A repo with flaky tests and 30% coverage gives you no signal at all, which means every agent change needs a full manual read, which means you’ve gained nothing.
This is also why agent-friendly and human-friendly repos are the same repos. Good docs, clear module boundaries, fast deterministic tests, a readable README that states invariants — these help the agent get it right the first time and help the reviewer verify it quickly. Investment here pays twice.
Where humans stay firmly in the loop
Not everything should be delegated, and pretending otherwise is how teams get burned. The line we draw: agents are excellent at well-scoped changes inside an established design, and untrustworthy at the design itself.
Keep humans authoritative on architecture, service boundaries, data models, and anything touching auth, money, migrations, or deletion. These are the decisions that are cheap to get wrong in the moment and ruinously expensive to unwind later — exactly the category where an agent’s confidence is most dangerous, because it has no skin in next year’s on-call rotation. Use agents to explore options here (“sketch three approaches to this schema change”) but make a human own the choice.
The practical heuristic: the more irreversible the change, the more human judgment per line. A throwaway internal script can be 95% agent-driven. A change to how you store customer payment tokens is a human decision that an agent may help implement, under close review.
The new pitfalls
Old failure modes don’t go away; new ones arrive, and they’re sneakier because the output looks so finished.
Review fatigue is the big one — covered above, and worth naming explicitly as a capacity problem, not a discipline problem. You cannot will yourself into reviewing unlimited diffs well.
Silent scope creep. Ask an agent to fix a bug and it may also rename three variables, “improve” an unrelated function, and reorganize imports. Each change is defensible; collectively they bury the one diff that mattered under noise that’s tedious to review and easy to wave through.
Confident wrong paths. An agent that misreads the goal doesn’t stall and ask — it commits, fully, and produces a coherent solution to the wrong problem. The earlier you inspect, the cheaper the correction. Watching the first few steps of an agent’s plan beats reviewing its finished 500-line monument to a misunderstanding.
Diffs nobody truly read. The quiet one. Approval becomes a reflex, the green checkmarks do the persuading, and three weeks later you’re in a postmortem reading a function that, it turns out, no human ever actually understood. If your approval doesn’t mean “I understood this and I’d defend it,” it doesn’t mean anything.
What this means for you
If you’re adopting agents on a real team, the work is not “give everyone a coding agent.” It’s redesigning the workflow around review and trust:
- Write a real Definition of Done and make it the agent’s brief: acceptance criteria, tests required, docs updated. Spec-first, not diff-first.
- Mandate small PRs. Now that splitting work is nearly free, large diffs are a choice — usually a bad one.
- Invest in CI and tests as the trust layer. Coverage and determinism are no longer hygiene; they’re what lets you safely review faster.
- Keep humans on the architecture and the risky surfaces. Delegate implementation, own the design and the irreversible calls.
- Treat reviewer attention as a budget, not an infinite resource. If generation outpaces review, you don’t have a speed win — you have a growing queue and a quality cliff.
- Make the repo legible. Docs, boundaries, and invariants help the agent and the reviewer in equal measure.
The teams that win with agents won’t be the ones that generate the most code. They’ll be the ones that built the cheapest path from “an agent claims this works” to “we trust it enough to ship.” Speed at the keyboard is solved. Speed-with-confidence is the new game, and it’s still very much a human one.