The Hypocrisy of Agentic Coding Critics
Every week I read another post arguing that LLM coding agents are fundamentally untrustworthy. The code is buggy. The output needs reviewing. The agent doesn’t fully understand the domain. It hallucinates progress. It tests only the happy path.
All valid criticisms. I’ve been managing software engineers for years, and this list sounds awfully familiar.
Consider engineers, real ones with degrees and salaries and opinions about typing systems and ORMs and Kubernetes:
- Produce code containing bugs. Every sprint. Consistently. 100% of the time.
- Provide cursory code reviews, approving changes they haven’t actually read.
- Begin work on features without fully understanding the domain, figuring they’ll learn as they go.
- Are optimistic about progress, sometimes to the point of fiction.
- Test only the happy path, because the edge cases are tedious and the deadline is Thursday.
- Throw commits to live on a Friday afternoon without running the test suite, then disappear for the weekend.
None of this is controversial. Every engineering manager has lived through all of it. And none of it stops us from hiring engineers, trusting them with production systems, or building entire organisations around their output.
Instead, we built disciplines to compensate. Code review processes. QA teams. CI pipelines. Incident reviews. Post-mortems. The history of software engineering is largely the history of building systems to catch human mistakes before they reach production. We didn’t reject human engineers because they were fallible. We built structures around their fallibility because their contribution was worth the effort.
Now consider the common arguments against LLM coding agents:
- The code needs human review.
- The agent might introduce subtle bugs.
- It doesn’t understand the broader architecture.
- It can produce plausible-looking output that’s fundamentally wrong.
- It works best on the straightforward cases and struggles with nuance.
Read that list again. Swap “agent” for “new hire” and nothing changes. These are the same failure modes we’ve been managing in humans for decades. The difference is that when a human exhibits them, we call it a development opportunity. When an LLM exhibits them, we call it a fundamental flaw and reach for the ‘slop’ accusation.
So why the double standard?
I think the discomfort is less about the quality of LLM output and more about the loss of a familiar model. We know how to manage humans. We have intuitions about when someone is struggling, when they’re bullshitting, when they need support. We’ve built careers around those intuitions. An LLM doesn’t fit neatly into that model, and that’s unsettling.
But if you step back from the emotional response, the engineering problem is the same: you have a contributor that produces imperfect output, and you need processes to ensure that imperfect output doesn’t reach production unchanged. We solved this problem already. We solve it every time we onboard a new team member.
There’s also something worth acknowledging about professional identity. For many engineers, the craft of writing code is the job. The idea that a machine can do a version of it, even an imperfect version, feels like a challenge to something personal. That’s understandable, and I don’t think dismissing that feeling is helpful. But it shouldn’t be confused with a technical argument about capability, the the technical argument is frequently pushed above the emotional reaction.
We need to concentrate on structures, not perfection.
The value of a coding agent isn’t that it produces perfect code. Neither does anyone on your team. The value is that it produces reviewable code at a speed and volume that changes what’s possible.
Organisations built management structures around messy, nondeterministic humans because the value of human contribution was obvious despite the mess. The same logic applies here. The question was never whether the output is flawless. It never has been, for anyone. The question is whether you can build processes around the tool that capture its value while containing its weaknesses.
And in many cases, the processes already exist. Code review catches bugs regardless of who or what wrote them. CI pipelines don’t care about the author. Tests either pass or they don’t. Type checkers don’t have opinions. Linters are indifferent to feelings. The infrastructure we built to manage human fallibility works just as well for managing LLM fallibility.
In some cases it works better, because an LLM will never take your review comments personally, never push back on making the ‘right’ call because it’s Friday afternoon, and never quietly revert your suggested changes in a follow-up commit.
So, the more productive conversation isn’t “should we use LLM agents?” but “what structures do we need to adapt?” Some existing processes transfer directly. Others need rethinking.
Code review, for instance, needs to evolve. When a human writes code, the reviewer can assume a certain baseline of intent: the author understood the requirements, made deliberate choices, and can explain their reasoning. With LLM-generated code, those assumptions don’t hold. The reviewer needs to verify not just correctness but appropriateness. That’s a different skill, and it’s one we should be actively developing in our teams rather than using its absence as a reason to avoid the tool.
Similarly, testing becomes more important, not less. If you’re integrating LLM-generated code, your test coverage needs to be comprehensive enough to catch the kinds of mistakes LLMs characteristically make. This isn’t a new problem. It’s the same argument we’ve always made for good test coverage. The LLM just makes the case more urgent.
The real risk isn’t that LLMs produce imperfect code. It’s that teams adopt them without the structures that make any contributor safe. An LLM without code review is dangerous. So is a human without code review. The failure mode is identical: unreviewed code reaching production.
If your processes can’t catch bad code regardless of its source, the problem isn’t the LLM. The problem is your processes.
If you’re going to argue that LLMs aren’t ready for production engineering work, I think you need to grapple with why the same flaws are acceptable in human engineers. If the answer is “humans understand what they’re doing,” I’d gently suggest sitting in on a few more code reviews. Understanding is a spectrum, and a significant amount of production code was written by people who were working it out as they went.
The honest assessment isn’t that LLMs are reliable. They are not, and pretending otherwise does everyone a disservice. But they’re a new kind of contributor with a familiar set of failure modes, and we already have decades of experience managing those failures in humans. The question is whether we’re willing to adapt our playbook, or whether we’d rather pretend that human-only engineering was working flawlessly all along.
Because it wasn’t. We just got used to it.