Table Of Contents
My last 3 articles, I just couldn’t bring myself to publish. I was too close to the subject matter, so they came off a little negative, and not what this blog is for.
Moving on with new material, I may revisit them later in the year. So, let’s go!
I watched a guy on YouTube last month build a “full SaaS product” in 45 minutes. Login page, Stripe integration, dashboard, the works. The comments were naive: “This changes everything”, “I’m quitting my job”, “No more tutorial hell”.
The demo actually looked really good and much better than I expected. And I’ll say this upfront, vibe coding as a concept is pretty useful. When you’re prototyping at home, testing an idea, learning how a stack fits together, building something only you’ll use, it’s brilliant. A basic one-shot prompt can really speed ideas up. The barrier to building “working” software has basically collapsed, and that’s mostly a good thing.
Where it stops being brilliant is somewhere between “my idea works” and “I’ve pushed it live and strangers are paying me for it”. That’s where I feel the line is. Before it, you’re experimenting, proving a concept and having fun creating. After it though, you’re accountable for looking after what you created and are being paid for. And what sits underneath most of these rapid-ship vibe-coded “products”, is code that just isn’t ready to carry that amount of accountability.
This article is about what that underneath looks like and where things inevitably go wrong.
Vibe Coding – Idea Vs Production
I want to be careful here, because I don’t want to come across as cranky at the whole “vibe” movement. The democratisation of software creation is great in many ways. People who couldn’t create an app a year ago can now, or at least can create a proof of concept. Those that don’t get to code as much as they used too, can fast track ideas they’ve always had. That’s a legitimately awesome shift, and there are plenty of good, low-risk uses for it.
The problem is in all the excitement, things get skipped.
Vibe coding at home is you, your ideas, your data, your risk. If it falls over, you shrug it off and try again without anybody else getting hurt. But vibe coding in production is other people’s data, other people’s money, and other people’s trust. If that falls over, real people cop it. You don’t get to walk away once you’ve taken payment. You’ve taken on a responsibility, whether or not you understand what is going on under the hood.
That’s the bit the ship-it-by-Monday content creators tend to skip over. Possibly as they themselves have no clue it even exists, but they are still selling “how-to’s” all the same. It’s also the bit where the security research gets mighty uncomfortable.

POC and Product Are Not the Same Thing
AI is good at generating things that look like software. Describe what you want in plain English, get a working prototype in minutes, via a one-shot prompt. I’ve done it myself plenty of times early on, it can be a bit of harmless fun.
But a proof of concept and a product are completely different creatures. A POC demonstrates that something can work. A product has to work safely, reliably, under load, with bad input, for a long time, and possibly under regulated compliance requirements, for strangers. The gap between those two things is where quality code and security lives, and it’s the gap most vibe-coded applications never deal with.
Y Combinator reported that 25% of their Winter 2025 batch had codebases that were 95% AI-generated. A quarter of one of the world’s best known startup accelerators running on code that statistically, has a massive chance of shipping real security issues, affecting real users.
The Numbers Aren’t Great
Veracode’s 2025 GenAI Code Security Report tested over 100 large language models across Java, JavaScript, Python, and C#, using 80 coding tasks designed to surface common security weaknesses. The headline: 45% of AI-generated code contained vulnerabilities aligned with the OWASP Top 10. Not theoretical flaws, actual OWASP Top 10 things that compromise real applications every day.
Java was the worst performer at a 72% failure rate. Cross-Site Scripting sat at 86%. Overall, AI-generated code contained 2.74 times more vulnerabilities than human-written code. And it isn’t improving with scale. Worse, there is a misconception that newer, larger models produce more secure code than older ones. They certainly get better at writing code, but they are not better at writing safe code, not yet at least.
Cloud Security Alliance found that 62% of AI-generated code solutions contained design flaws or known vulnerabilities, even when developers used the latest models. Again, crap in crap out.
I believe we also enable it, in the excitement to get things done, to look good for our seniors and c-suites, hoping for a gold star and a pay rise.
‘Cos any one of those numbers alone should make you say damn. But together, they describe a consistent pattern. AI is optimising for execution, not safety.

What Actually Breaks
The general statistics land differently when you look at the specific ways this shows up in real code.
Hardcoded credentials is the one that keeps appearing. A vibe-coded platform exposed 1.5 million authentication tokens because they were embedded directly in source files rather than managed through a secrets store. The kind of mistake any security review would catch in minutes, if it was even considered.
Arbitrary code execution is another. Testing revealed AI-generated code for a multiplayer game using Python’s pickle module to serialise network data without validation. That’s a well-known path to remote code execution. AI wrote it anyway, because it “worked”.
And perhaps the most telling pattern of all, AI agents “fixing” runtime errors by removing validation checks, relaxing database policies, or disabling authentication flows, because those controls were the thing throwing the error. In plain English, faking results to appease the human overlords.
When the tool writing the code treats “security control” as an “error to suppress”, you have a structural problem. If Anthropic’s Mythos is to be believed (and I do), these are bound to create headaches down the track, as people find and take advantage of these exploits.
Functional Doesn’t Mean Safe
This is the assumption that keeps catching out non-technical AI founders, ‘cos they just don’t know or don’t care enough to do proper due diligence. Code that runs is not necessarily code that’s secure. You can have a beautifully functional login page that stores passwords in plain text, that anyone looking will find.
The assumption of correctness, the idea that if it runs it must be right, is exactly what vibe coding encourages. The whole premise is that you don’t need to understand the code. You describe the outcome, trust the output, and move on. To stay in the flow and get the idea out of your head.
Traditional development has its guardrails, whatever the methodology. Peer reviews, static analysis, QA, security testing, performance testing, etc. Not every team runs all of them, and not every team runs the ones they do well. But they exist as checkpoints or gates between the build and the deploy.
The Black Box Problem
From a QA perspective this is the part that really worries me.
When a developer writes code, they understand the logic. When something breaks, they can trace it, find the flaw, fix it. When AI generates code, the person who shipped it often can’t do any of that. The code is a complete black box. It works until it doesn’t and when it doesn’t, they’re stuck.
I have dealt with this when consulting. People using systems like Base44 or Claude Code (chat), but reach a point they can’t make sense of things and are going in circles.
This isn’t about debugging convenience. It’s about the ability to spot risk. If you can’t read the code, you can’t review it. And you definitely can’t tell if the AI chose an insecure method over a secure one.
If you don’t know, you get the experts in, you don’t just ignore the problem or the risks.

The AI Tax
The high we get from shipping something in an afternoon is very real, if you let it. So is what comes after all the excitement wears off.
Around 66% of developers report a productivity tax from cleaning up AI-generated code that’s almost, but not quite right. Technical debt in vibe-coded projects accumulates roughly three times faster than in traditional development.
The rebuild cost for a vibe-coded prototype that needs to handle real users sits somewhere between $5,000 and $30,000, depending on how thoroughly it needs to be unpicked. And that’s only dealing with code.
A study of 600 HR professionals found two in three employers who cut headcount for AI reasons are already re-hiring. Nearly 36% have brought back half the roles they originally cut. Klarna reversed course after replacing 700 customer service workers and watching satisfaction drop off a cliff. IBM rehired after laying off 8,000 to stand up an AI-led HR function, having discovered the AI lacked the empathy and judgement the work actually required.
The pattern isn’t “AI is bad”. The pattern is “AI without the human judgement layer costs more than it saves”.
What Would Actually Help
I don’t think vibe coding goes away and I don’t think it can now. There are good uses for AI-generated code even in professional settings, and the accessibility it offers newcomers to software is a real gift.
The issue is the current default, where unvetted code moves straight from a prompt to a paying customer with no review, no testing, and no understanding of what’s underneath.
A few things would shift the picture.
AI coding tools need better default security behaviour. When a model generates code with a known vulnerability pattern, it should flag it or refuse to produce it in the first place. Security-focused prompting techniques like “self-reflection”, where the model reviews its own output for vulnerabilities, have been shown to reduce insecure code generation by 30 to 50%.
Deployment platforms need to own baseline security scanning. If you’re Vercel, Netlify, Base44, or any place absorbing the flood of AI-generated apps, automated vulnerability scanning at deploy time shouldn’t be optional.
Use static analysis tools like Semgrep or SonarQube, integrated into the pipeline, to flag unsafe functions, insecure deserialisation, and unchecked inputs as they appear. There are many others, they just need a little more love and attention.
And individuals building with AI need to understand that “it works” is the starting line, not the finish line. If you’re handling payments or personal data, get a professional review. The cost of one good review is worth every cent to avoid the cost of a real breach.
The Take
Vibe coding as a practice isn’t the problem. At home, in prototypes, as a learning tool, it’s brilliant. The problem is the step that gets skipped when excitement pushes a prototype to an instant product.
45% of AI-generated code contains security vulnerabilities. Technical debt accumulates three times faster. Real breaches have already happened because of real decisions to ship unreviewed AI code to paying users.
None of that is an argument against AI-assisted development. It’s an argument for putting the same rigour around it that we’ve always put around production code.
Review. Scan. Test. Rinse & Repeat.
Stats, I don’t care for, they can be manipulated. Years of development however is fact, due diligence and proper planning for quality works best.
Be safe out there.
A Note on Context
Every business and every project is different. What works in one place won’t work in another, and that’s the point.
Nothing here is meant to be a step-by-step prescription. It’s guidance, drawn from my own experiences, and deliberately kept general to avoid pointing fingers anywhere.
Take what’s useful, ignore what isn’t, and adapt it to your own context. Or as Joe Colantonio always says: “Test everything and keep the good.”

