Developer-led AI vs vibe coding

Not all AI-written code is equal. Developer-led AI ships well. Prompt-only 'vibe coding' falls apart by month six. How to tell the difference.

pragmatic-ai ai developer-led-ai vibe-coding owner nz

29 April 2026 Urban Lightbulb

The industry has collapsed all AI-written code into one conversation. "Did you use AI?" yes-or-no, as if that tells you anything about what you're buying.

It doesn't. There's a sharp line between AI used by an experienced engineer (as a peer, with review, tests, and architectural judgement around the output) and AI used as a substitute for that engineer ("vibe coding", prompt-only builds). The first ships better software than humans can write alone. The second demos well and falls over in month six.

This article is about that line, why it matters, and how to tell which side of it your developer is actually on.

What we mean when we say we use AI

First, the disclosure, because it matters for how the rest of this reads. Urban Lightbulb runs AI heavily. Claude Code is part of our daily workflow. The majority of our modern code starts as AI output. We're not writing this piece to sell you a handcrafted-artisanal-bespoke position. We think AI + an engineer produces better software than either alone, and we'll show our working.

The line we're drawing isn't AI-on vs AI-off. It's whether there's an engineer in the loop at all.

Vibe coding: the term and the problem

The phrase comes from Andrej Karpathy, who posted it on X in February 2025: "a new kind of coding I call 'vibe coding', where you fully give in to the vibes, embrace exponentials, and forget that the code even exists... I 'Accept All' always, I don't read the diffs anymore." Karpathy meant it half-affectionately, describing a style that's useful for weekend projects. The industry picked it up and ran with it, and it now names a specific failure mode: a non-engineer prompts an AI to "build me a SaaS", accepts everything, and ships the result.

The output of that process isn't what you think it is. It's code that's optimised to pass the demo. Happy-path features work. Buttons do what buttons should do. The screenshots in the pitch deck are real. What isn't there: the error handling for the 20% of inputs outside the happy path, the security consideration that would have closed an injection vector, the architectural decision that would have made the next feature possible to add without rewriting the first one.

A Stanford study published in 2023 found participants with access to an AI coding assistant wrote significantly less secure code than those without, and were more likely to believe their code was secure than it actually was. The effect was strongest in the participants who trusted the AI most and engaged with the prompts least. That's the shape of the problem.

A pipeline diagram drawn as a sketch: AI generates code → human reads → second AI review pass → tests run → refactor → merge. Arrows between stages. Clean, orderly. Emphasise the loop, not speed.

What developer-led AI actually looks like

An engineer uses AI differently from the start, before a single line is generated. The prompt itself is technical: type signatures, framework idioms, architectural constraints, the specific pattern the team uses in this part of the codebase. The AI gets context a non-engineer can't provide, and the output is proportionally better.

From there, the AI functions as a peer developer, not a sole developer. The engineer soundboards architecture decisions against it, pressure-tests trade-offs, asks for counterarguments to their own design. It's a technical conversation, not a request-and-receive transaction. The code the AI then writes moves through a rigorous process: automated review passes by separate AI agents, a full test suite, CI checks, and human review at the points where judgement actually changes the outcome — architectural decisions, commit boundaries, pull requests.

It's worth being clear about what that process does and doesn't require. The engineer doesn't need to watch every line being written in real time. With modern models, that's quickly becoming theatre. What they do need is to own the process that ensures the output meets the bar, and to be accountable when it doesn't. That's a different job from being the eyeball on every diff. An engineer who sets up rigorous AI review, comprehensive tests, CI that actually catches regressions, and a commit discipline that forces review at the right moments can let the AI run for hours and still ship better code than they'd write by hand.

That pipeline sounds like overhead. It isn't. A 2024 field experiment across Microsoft, Accenture, and a Fortune 100 company found developers with AI assistants completed 26 percent more tasks than the control group, with no measurable drop in quality when engineers were in the loop. The paired workflow is faster and better than solo human code. What it isn't is prompt-and-ship.

What the human brings that the AI can't

Here's the part most of the discourse misses. The AI is extraordinary at the code in front of it. It's blind to everything else.

An engineer holds the shape of the system in their head. They know which assumption in the auth layer is going to break when you add a second tenant next quarter. They know that the database index that's fine at a thousand rows will kill the query at a hundred thousand. They know that the innocuous refactor in module A is going to cascade into a bug in module C because the two talk via a contract that isn't written down anywhere. The AI sees the file it's been given. The engineer sees the next ten features, the scale curve, and the blast radius of every change.

That's what "human in the loop" actually means at Urban Lightbulb, and why we treat it as a load-bearing phrase rather than a marketing caveat. The AI handles the code. The engineer holds the boundaries: what can break, when, and why. Everything interesting about how software ages lives in those boundaries.

A 'month 1 vs month 6' before-and-after: left side shows a polished launch-day demo; right side shows the same app covered in warning icons, bug reports, and a frazzled developer trying to patch it. Making the production-gap concrete.

The signal isn't AI-on or AI-off

The industry keeps arguing about whether AI should be used in production code at all. That's the wrong axis. The real question is whether there's an engineer in the loop holding the boundaries. Ninety percent AI-written code with a rigorous process and an accountable engineer is better than twenty percent AI-written code with no process at all. The volume tells you nothing; the pipeline around it tells you everything.

Paired AI is better than solo human code

This is the part the anti-AI side of the conversation won't concede, and it matters. When an engineer is driving, AI finds things humans miss.

Google's Big Sleep project, published in late 2024, reported the first public example of an AI agent finding a previously unknown, exploitable memory-safety vulnerability in widely-used real-world software: a buffer underflow in SQLite that had survived years of human review and fuzzing. Follow-up reports indicated the system found twenty-plus real vulnerabilities across open-source projects in the months after.

That's not AI replacing humans. It's AI paired with security engineers catching bugs the engineers, on their own, had already missed. The same principle applies in application code. An engineer reviewing AI suggestions spots their own blind spots. The AI suggesting alternatives pressure-tests the engineer's default choices. The output is measurably better than either working alone.

The 2024 Stack Overflow Developer Survey backs the adoption curve up: 76 percent of developers are using or planning to use AI tools. The skepticism is real too: only 43 percent trust the accuracy of AI output, and that skepticism scales with experience. Senior engineers use AI more and trust it less, which is exactly the posture that makes the paired workflow work.

Where prompt-only damage shows up

Months two through six. Never launch day.

On launch day everything looks fine, because the features the demo showed off are the features the prompts asked for. The bugs arrive when real users do things the happy path didn't consider: a Unicode character in a name field that crashes the PDF generator, a session that stays active across two browser tabs and corrupts the state, a query that's instant on a developer's laptop and times out on the production database.

GitClear's 2024 analysis of 153 million changed lines of code found code churn (lines reverted or updated within two weeks of being written) was projected to double in 2024 versus the pre-AI 2021 baseline. Copy-pasted code now exceeds refactored code in volume. That's what happens when the pipeline is thin: shipped code gets unshipped faster, and the codebase accumulates patterns that won't survive the next feature.

When prompt-only is fine

Worth saying plainly, because a blanket condemnation would be dishonest. Prompt-only builds are fine for throwaway prototypes. A weekend proof-of-concept to show a cofounder the shape of an idea. An internal tool that three people will use to automate a repetitive task, with no customer data and no compliance surface. A rough test of whether a feature is worth building properly. Karpathy was right that this style of coding has a place. It's the pressure of "this is the real build" that makes it a problem.

The line is the same one we keep returning to. If the output has to survive scale, users, edge cases, or security audits, you need an engineer in the loop. If it doesn't, you don't.

A person at a workbench with AI tools laid out like a tradesperson's kit — chisel, saw, measuring tape — using them deliberately. The tools-not-magic framing. Craft, not conjuring.

How to tell across the table

When you're shortlisting developers, the question isn't "do you use AI?" (everyone does, or is about to). The question is what the pipeline looks like around it.

A good answer sounds like: we use AI heavily, we have a rigorous process around it — automated AI review passes, full test coverage, CI that catches regressions, human review at commit and at architectural decisions — and an engineer is accountable for the output. The pipeline has named stages. The developer can describe what happens if the AI and the engineer disagree about a design decision (the engineer wins, with reasons).

Notice what's not on that list: "we watch every line as it's written." Some teams still do that, and that's fine. But the signal to listen for is the process, not the real-time monitoring. An engineer who's built a rigorous pipeline and owns the outcome is doing developer-led AI, whether they're watching every diff or letting the AI run for twenty minutes while they're deep in an adjacent file.

A red-flag answer sounds like either extreme. "We don't use AI, we handcraft everything" puts the team behind the curve, and more expensive for the same output. "We prompt it and ship what comes out, it's way faster now" means no pipeline, no engineer in the loop, and you're going to find the bugs in month four. Either answer tells you the developer hasn't done the thinking this tool requires.

The pricing and choosing questions in our earlier article cover the rest of the vendor-selection ground. This one is just the AI-specific question, because it's the one most owners don't know how to ask yet.

The real question

Is there an engineer in the loop?

Everything else (the volume of AI, the specific tool, the prompting style) is a detail. The answer to that question predicts whether the software you're paying for will still be working in month twelve.