Demo ≠ production. Practical hardening for vibe coded apps: auth boundaries, secrets, error handling, and minimal tests that stop regressions before store release.

Published June 2026 by Batteries Included

A working demo on your device is not a production app. The gap isn't signing or store policies; those come later. The gap is that your demo was never tested with real accounts, real users, or real edge cases. Hardening means adding the boundaries, secrets management, and guardrails that keep the next AI prompt from silently breaking what already works.

Demo vs. production: what breaks first

You ran the app a hundred times. It works. Then your first beta tester logs in on a different device and hits a blank screen.

Here's what actually changes when real people use your app:

Real accounts, not your test account. Your demo probably has one user: you. The moment you add a second account, you discover whether your API actually enforces data isolation or just returns everything and relies on the UI to filter it. Spoiler: AI-generated backends often do the latter.

Second device, different session state. Token expiry, logout behavior, what happens when someone installs the app fresh. These paths exist in theory but never got tested. Most vibe coded apps skip session edge cases entirely.

Release builds behave differently from debug builds. Code shrinking, optimizations, and signing can surface bugs that your debug run never triggered. This is covered in the store release checklist. Keep it in mind but don't spend time on it until hardening is done.

API keys in the client. Every secret hardcoded in your app binary or committed to your repo is readable by anyone who knows where to look. This isn't theoretical: a 2025–2026 scan of over 5,600 apps built with AI coding tools found 400+ exposed secrets (API keys, credentials, and tokens) in production apps, per OX Security's summary of Escape.tech findings. Once a key is in a client bundle, it's compromised.

No error handling on real input. AI-generated code optimizes for the happy path. Users don't. They will enter unexpected input, lose connectivity mid-request, tap twice, and close the app at the wrong moment. Without error handling, they see blank screens, silent failures, or partial state that's impossible to recover from.

Prompt roulette. Without tests, every prompt you send to fix one thing might break two others. You won't know until a user hits the broken path.

Minimum hardening checklist

Do these before any beta users see the app. Not all at once; prioritize by what's most likely to cause data loss, auth failures, or security issues first.

1. Secrets and config off the client

No API keys, tokens, or credentials in your app binary or source repo. Anything the client needs to call an external service should go through a backend you control, not a key bundled into the app.

If you're using a managed backend (Supabase, Firebase, etc.), understand which keys are publishable vs. secret, and confirm your Row Level Security or equivalent is actually enabled and tested with a second account.

2. Auth and data isolation that matches your product rules

Log in as user A. Try to access user B's data by guessing IDs, modifying requests, or hitting API endpoints directly. If you can, your auth boundaries are broken.

OWASP Mobile Top 10: M3: Insecure Authentication/Authorization explicitly notes that “backend systems should independently verify the roles and permissions of the authenticated user” and not rely on anything that comes from the client. AI-generated code frequently skips server-side enforcement and trusts client-supplied user IDs.

3. Input validation and error surfaces users can understand

Every form, API call, and user action needs two things: validation that rejects garbage input before it hits your backend, and an error message the user can actually act on. “Something went wrong” is not sufficient. “Your session expired. Tap here to log in again” is.

Don't rely on the AI to add these after the fact. Do API input validation and error handling deliberately, starting with your core flows.

4. One module boundary: what AI can touch vs. what's frozen

Pick the 2–3 files that handle auth, payments, or data access and mark them as off-limits for casual AI prompts. Write a comment at the top, tell your team, put it in your notes; whatever your workflow supports. The goal is a boundary you enforce manually so that a refactor prompt doesn't accidentally regenerate your auth logic.

This is a process constraint, not a code pattern. It buys you room to keep moving fast on UI and features while keeping the critical paths stable.

5. Minimal automated tests on money paths

You don't need 100% coverage. You need tests on the flows where a silent regression is costly: login and logout, the core user action, payment or subscription if you have one.

Even 5–10 tests that run in CI will catch the regressions that prompt-based development creates constantly. The goal isn't completeness; it's a safety net so you know immediately when a fix breaks something.

6. Types or strict lint where they prevent the bugs you've already seen

This is stack-specific, but the principle is the same everywhere: if a category of bug has already bitten you, make the compiler or linter catch it automatically.

  • TypeScript: enable strict mode. It catches null reference issues and missing checks that the AI consistently generates.
  • Swift: treat warnings as errors in your release scheme. @MainActor annotations and optionals guard against the concurrency and nil-deref bugs that appear under load.
  • Kotlin: lean on the type system's nullability. !! operator uses in AI-generated code are often unchecked. Find and replace them with proper null handling.

Adding strict types after the fact is tedious, but it's far cheaper than chasing runtime crashes with real users.

What AI-generated code tends to skip

This is our observation from working on inherited and AI built codebases, consistent with published research.

Security, especially access control. The Veracode 2025 GenAI Code Security Report, which tested over 100 LLMs, found that AI-generated code introduced a detectable OWASP Top 10 security vulnerability in 45% of cases. Auth gaps and broken access control are the most common mobile manifestation: the client does the filtering and the server trusts it.

Error handling on anything but the happy path. The demo always shows the flow working. Real apps need to handle token expiry, network loss, API rate limits, and server errors gracefully. AI-generated code rarely includes this without being explicitly prompted for it.

Copy-pasted patterns that don't fit your stack. AI models generate plausible-looking code that may import the wrong library, use a deprecated API, or apply a pattern designed for a different framework. It compiles, it runs in debug, and then fails on a device, in a release build, or under load.

Tests. Unless you ask explicitly, AI coding tools don't generate tests for the code they write. This creates tight coupling: every change is a gamble because nothing verifies the existing behavior still holds.

Defensible architecture. Everything is connected to everything. Swap one dependency and you're touching ten files. This isn't a disaster for a solo demo; it becomes one when you need to swap an auth provider, add multi-tenancy, or hand the codebase to another developer.

How this differs from store release

Store release is a separate concern that comes after hardening.

Getting your app into the stores involves signing, build artifacts, privacy manifests, and policy declarations. Those are unrelated to whether your auth logic is correct or your API keys are exposed. The vibe coding store release checklist covers all of that, including Android App Bundles, iOS archive and distribution, Data safety forms, and required-reason APIs.

The right order: harden first, then prepare for stores. Submitting an unprotected app faster doesn't help you ; it just gets real user data into a system that wasn't ready for it.

When self-serve is enough vs. when to get help

Self-serve is reasonable if:

  • You're a solo builder with a single-platform app and time to work through the list above.
  • There are no payments, no PII (personal identifying information) beyond basic auth, and no multi-user data isolation requirements.
  • You can reproduce and understand any failures yourself.

Get help if:

  • The app handles payments or sensitive user data.
  • You have multi-user auth and can't confidently verify data isolation.
  • Failures only appear in release builds or on real devices, not in your dev environment.
  • Every AI fix creates new problems and you've lost confidence in what's stable.
  • You're not sure where to start and a week of guessing would cost more than an audit.

This maps directly to Production Readiness Audit on our vibe coded app rescue page: a ~1 week engagement for apps where the demo works but the code isn't ready for real users. We go through the checklist above, identify the highest-risk gaps, and give you a prioritized fix list with enough context to act on it yourself or with us.

Demo works but not ready for real users?

Work through secrets and auth first; those are the gaps that cause real damage. When you're ready for store submission, the store release checklist covers what comes next. Want a second set of eyes? Production Readiness Audit (~1 week) is a fixed scope pass through the checklist above. See all rescue packages.

Sources