Claude Code Is Brilliant. It's Also Forrest Gump.

Remember the football scene in Forrest Gump? He gets the ball, starts running, and doesn’t stop. He runs right through the end zone, past the bleachers, and keeps going. Someone has to physically hold up a banner to make him stop.

That’s most AI coding agents.

I use Claude Code daily. It’s genuinely brilliant. I’m building production systems with it — agent platforms, Kubernetes operators, evaluation frameworks. It writes Go, TypeScript, and Python at a level that would pass most code reviews. It understands architectural patterns. It can hold complex multi-file changes in its head and execute them coherently.

And it has absolutely no idea when to stop.

The Forrest Gump problem

AI agents don’t fail because they’re slow. They don’t fail because the code is bad. They fail because they don’t know when they’ve gone too far.

Ask Claude Code to fix a bug and it’ll fix the bug. Then it’ll refactor the surrounding code. Then it’ll add error handling you didn’t ask for. Then it’ll update the tests. Then it’ll notice the tests could be better structured and reorganise them. Then it’ll spot an inconsistency in a neighbouring module and fix that too. Before you know it, you’ve got a pull request that touches 40 files when you asked for a two-line fix.

The code is usually fine. That’s the dangerous part. It all looks reasonable. It all passes the tests. It all follows good engineering practices. But it’s solving problems you didn’t have, introducing changes you didn’t ask for, and creating a blast radius that makes the PR unreviewable.

Someone has to hold up the banner. And that someone is you.

The 80/80/80 problem

There’s an old observation in software — a variation on the Pareto principle — that 80% of projects spend 80% of their time at 80% complete.

The first 80% is the fun bit. The green field. The happy path. The architecture takes shape, the features come together, the demos look great. Everyone’s optimistic.

Then you hit the last 20%. The edge cases. The error handling. The performance under load. The security review. The deployment pipeline. The monitoring. The documentation. The thing where it works perfectly except when a user does that one thing nobody thought of. The integration with that legacy system that was supposed to be straightforward.

This is where projects live and die. Not in the first 80% — in the grind of the last 20%.

AI agents are phenomenal at the first 80%. They can scaffold a project, implement features, write tests, and produce “dev-done” code at a speed that would have seemed impossible two years ago. A senior engineer with good AI tools can produce in a day what used to take a team a week.

But agents don’t fix the 80/80/80 problem. They just help you reach 80% sooner.

Does “done-looking” code make the last 20% harder?

This is the question I genuinely want answered, because I’ve seen it go both ways.

The optimistic case: AI-generated code that’s well-structured and well-tested gives you a better starting point for the hard part. The scaffolding is solid. The patterns are consistent. You spend less time on boilerplate and more time on the genuinely hard problems. The last 20% is still hard, but you get there faster because the foundation is better.

The pessimistic case: AI-generated code creates an illusion of progress. You’ve got 40,000 lines of code that looks complete. It passes tests. It demos well. But nobody on the team fully understands it because they didn’t write it. The architectural decisions were made by an agent that optimised for “plausible-looking code” rather than “the right solution for this specific context.” When you hit the hard 20%, you’re debugging someone else’s work — except that someone isn’t available to explain their thinking.

In my experience, both cases are real. The difference is the human in the loop.

The human in the loop isn’t optional

When I use Claude Code well, the workflow looks like this:

I do the thinking. I understand the problem. I know the constraints. I’ve designed the approach. I know what “done” looks like and, critically, what “too much” looks like.

The agent does the execution. It writes the code, the tests, the infrastructure. Fast. The mechanical part — the typing, the boilerplate, the looking up of API signatures — is handled.

I review with intent. Not rubber-stamping. Actually reading the code, checking it against my mental model, pushing back on unnecessary additions, and asking “did I actually need this?”

When I skip the first step — when I give Claude Code a vague brief and let it run — I get the Forrest Gump outcome. Lots of running. Very fast. In roughly the right direction. But miles past where I needed to be, with a trail of “improvements” I now need to evaluate.

The agent is an amplifier. It amplifies good direction into great output. It also amplifies vague direction into a confident mess.

What this means for teams

If you’re a team lead or architect thinking about how to integrate AI agents into your workflow, here’s what I’d focus on:

The agent doesn’t replace the architect. It replaces the typing. The person who understands the problem, sets the direction, and knows when to stop is more important than ever. If anything, AI agents make architectural skill more valuable — because the cost of building the wrong thing has dropped to nearly zero, which means you’ll build a lot more wrong things if nobody’s steering.

Review culture matters more, not less. AI-generated PRs look plausible. They pass linters and tests. They follow patterns. The things that are wrong with them are subtle — unnecessary complexity, wrong abstractions, scope creep, architectural drift. You need reviewers who understand the system well enough to catch these things. Rubber-stamp reviews of AI code are how you end up with a codebase nobody understands.

Define “done” before you start. The single most effective thing you can do with an AI agent is give it a clear, bounded brief. “Fix this specific bug in this specific file” produces dramatically better results than “improve the error handling in this module.” Boundaries are the banner that stops Forrest running.

Measure outcomes, not output. AI agents produce a lot of output. That’s the point. But output isn’t the same as progress. Lines of code, number of PRs, velocity points — all meaningless if the last 20% still takes 80% of the time. Measure what matters: is the system reliable? Can the team maintain it? Does it solve the actual problem?

The honest answer

Do agents reduce the last 20%, or do they make it harder?

Both. It depends entirely on whether a human with good judgement is holding the banner.

In the hands of a senior engineer who understands the problem, AI agents are transformative. They eliminate the mechanical work and let you focus on the hard stuff. The last 20% is still hard — but you get there sooner and with a better foundation.

In the hands of someone who doesn’t understand the problem — or worse, a team that uses AI to avoid understanding the problem — agents produce a lot of confident, well-structured code that’s solving the wrong thing. The last 20% becomes harder because you’re further from the right answer, not closer.

Claude Code is brilliant. I use it every day. But like Forrest, it needs someone to tell it when to stop running.

That’s not a limitation of AI. That’s the job description for the humans who remain.

Claude Code Is Brilliant. It's Also Forrest Gump.

The Forrest Gump problem

The 80/80/80 problem

Does “done-looking” code make the last 20% harder?

The human in the loop isn’t optional

What this means for teams

The honest answer

Related Posts

AI Is a Model of Reality. It's Not Reality.

Serverless at Scale: An Honest Take

In the AI Gold Rush, the Only Ones Smiling Are the Lads Selling Shovels

Have Cloud Architects Become Personal Shoppers for the Big Three?