Inspired by Nacho Lara’s recent post on agentic system design and SDLC transformation.
Nacho Lara recently wrote about the shifting bottleneck in software development, where the constraint has moved from the speed of coding to the capacity to review and validate that code. That resonated with me immediately. But the more I think about it, the more I suspect the shift goes even deeper than that. It’s not just that code review is now the bottleneck. It’s that the entire architecture of how we structure software delivery is changing.
And we’re noticing it in real time.
The pieces are all moving
Here’s what I’m finding: product is no longer just writing requirements. It’s writing definition files, creating specification artifacts such as detailed prototypes, behavioural specs, and acceptance-criteria frameworks, that become the starting instructions for agentic development. In parallel, QA isn’t waiting until code is done. It’s working alongside product to define what “good” looks like before a single line is written. And then when code does get generated, multiple agents are reviewing it independently, not sequentially.
The SDLC isn’t getting simpler. It’s getting more distributed and more parallel.
What started as a question about code review speed has become a question about trust architecture. How do you organize teams and processes so that autonomous agents can move fast and you have real confidence in what’s being deployed?
Product definition becomes the seed
Let’s start with product. From what I’m seeing, product managers are writing more specs, not fewer, and they’re more detailed, because those specs become the actual instructions agents will execute. But here’s the key difference: the feedback loop is shorter.
Traditionally, product would write a spec, work with a UI/UX designer to create wireframes in Figma, and if you were lucky, end up with a clickable but rudimentary design. Now, product teams can write a spec, hand it to an agent, and have a working prototype to critique in hours. That tighter loop means specs get iterated faster. They’re tested against reality sooner. Which, paradoxically, means more spec-writing but higher-quality output.
The craft of product management isn’t diminishing. It’s just that the tools have changed what good looks like.
Testing as a first-class concern
One pattern that keeps coming up: testing has always been treated as a second thought in most organizations. Even in shops that practice test-driven development, there’s this gravitational pull toward shipping features first and testing later. And frankly, that makes sense. End-to-end tests are expensive. They’re slow. Running a full test suite for a feature can take hours. So you feel less confident, you accept more risk, and you move forward anyway.
What if testing became a first-class concern from the moment product finishes its work?
I’m thinking about this as agentic test definition. Alongside the spec that product creates, a QA agent (or a human QA engineer working with an agent) could generate a test strategy. What metrics do we want to observe? What behavioral edge cases matter? What performance thresholds would indicate a problem? What does an acceptable rollback look like? All of that defined upfront, not as an afterthought.
The tests themselves might still be written and executed by agents as code is generated. But the thinking about what to test, and why, happens in parallel with product definition. Testing stops being a phase. It becomes part of the architecture.
Independent review and the multi-agent problem
Now we hit the bottleneck that Nacho was describing: code review. In a world where agents are generating significant portions of the codebase, a single engineer reviewing that code, or worse, the same agent that generated it reviewing its own work, is asking for trouble. Self-reinforcement is a real problem. An agent that made a particular architectural choice, a particular trade-off, won’t naturally second-guess that choice.
So the answer starts to look like multi-agent review. A different agent (or a team of agents) reviews the code independently, without having been part of its creation. This isn’t new thinking. Code review has always worked better when it’s not the author reviewing their own work. But at scale, with agentic systems, it becomes essential.
Tools like Claude Code Review are emerging as an answer to this. I should be honest: these tools are still expensive right now, and the field is early. But they point to something real. If you can’t have humans review all the code an agent generates (because there’s simply too much of it), you need another layer of agents that can review independently and at scale. This isn’t theoretical. I already know teams using this pattern in production.
The cost will decrease. The pattern will remain.
Governance at speed
Once code is deployed, you need a safety net. The specifics will look different depending on your context, a fintech processing payments has a very different risk tolerance than an internal tool. But the principle is the same: when deployment velocity increases this much, you need automated confidence gates.
None of the techniques here are new. Canary deployments, blue-green deployments, auto-rollback triggered by metric thresholds, progressive delivery with gates between stages. These have been around for years. The point isn’t inventing new deployment strategies. It’s making sure you’re actually using them, and that they’re designed to minimise disruption to your customers when things move this fast.
The conversation shifts from “should we do canary deployments?” to “what metrics trigger a rollback, and have we tested that response?”
Teams and the shifting craft
A practical consequence of all this is that teams look different. Not in terms of roles, we still need product, engineering, QA, but in terms of structure and autonomy.
I think the traditional cross-functional Scrum team of 5-6 people will become less common over time. I suspect we’ll see teams getting smaller, more autonomous, and more vertically integrated. A single engineer (or a pair) with direct relationships to product and QA, owning a vertical slice of the system, working with agents to move fast. Senior technical leaders shift their focus from day-to-day coordination to designing the guardrails, the telemetry, the rollback policies that keep the system safe. It’s a different kind of impact, multiplying the output of everyone else.
The craft is shifting. But engineers have always been problem solvers first. The coding was always a means to that end. Now the tools are catching up to that reality. I wrote more about this in AI Adoption Isn’t About Tools. It’s About Opportunity.
What I’m not sure about
I don’t think this is the full picture yet. There are pieces we’re still working through.
How much trust can you actually build in agentic systems at scale? The lived experience with real teams, real code, real users is still developing. You’re going to feel less confident about what’s going through when you can’t physically review it yourself. That’s not a problem that disappears. It shifts. How do you organise systems and governance so that you can feel confident despite not doing the review yourself?
There’s a real risk that we ship features we don’t fully understand because an agent generated them and we trusted the multi-layer review process. That might be a good trade-off. But it’s a trade-off we’re making, and I think we should be explicit about it.
And does smaller, more autonomous, more vertically integrated actually work at the scale of large systems? I think it does. But large systems have coordination problems that small teams bump into. How do we solve that in an agentic world?
The bottleneck has moved from “can we code fast enough?” to “can we organise enough confidence and governance to let agents move fast safely?” That’s the architecture we’re building now, and these are the questions I’m sitting with.
If you’re thinking through the same things, I’d love to hear what you’re noticing.