AI-Built App Production Readiness Checklist

TLDR: AI tools optimise for the happy path. This is the unhappy path checklist.

I review AI-built apps before they go live. The pattern I see most often is not broken code. It is code that works exactly as the AI specified it, where the specification was silent about what happens when things go wrong.

These are the ten categories I check on every review. For each one, I have listed the questions that catch the most problems. If you cannot answer them confidently, investigate before you have real users and real money at stake.

One caveat: this checklist is a starting point, not a substitute for a code review. It tells you where to look. It cannot tell you what is actually in your codebase.


1. Authentication and authorization

Auth is the most reliably broken category in AI-built apps. Not because the login flow is wrong — it usually works. The failures are in everything around it: routes that require authentication in the UI but not in the API, role checks that exist client-side and nowhere else, session expiry the app never tests.

  • Can a logged-out user reach any route, API endpoint, or server action that should require authentication?
  • If you have role boundaries, can a lower-privileged user call an endpoint intended for a higher-privileged one?
  • What happens when a session expires mid-request?
  • Can a user access or modify another user's records by changing an ID in the URL or request body?
  • Is ownership enforced at the database query level, or only in the UI?

Auth bugs are silent in development because you test with your own account. They surface in production when someone probes your API directly or notices they can see another user's data. By then you have a breach, not a bug.

What good looks like: every protected route and API endpoint checks authentication and authorization server-side, regardless of what the UI shows. Ownership checks compare the authenticated user to the record owner before returning or mutating anything.


2. Data boundaries

Multi-user apps need to keep user data separate. AI tools often get this right in the first feature and miss it in the second or third, when a new feature reuses an existing query without adding the owner filter.

  • Do all database queries that return user data filter on the authenticated user's ID?
  • Is there any admin path a regular user could reach through a URL, query parameter, or API call?
  • If you have multi-tenancy, can a user in one tenant access records belonging to another?
  • Are your database credentials scoped to minimum permissions, or does the app connect as a superuser?
  • Can list endpoints be used to enumerate other users' records?

Data leaks between users are invisible in testing because you test with your own account. They appear in production when you have two users with real data and a query that does not filter correctly.

What good looks like: every query that touches user data includes an owner or tenant filter. Admin paths require an explicit check, not just a hidden URL. Database credentials can read and write what the app needs and nothing else.


3. Secrets and environment configuration

AI tools generate code with hardcoded credentials more often than you might expect. Not always obviously — sometimes it is a default password that was never changed, or a key in a comment that made it into source control.

  • Search your codebase for sk-, Bearer , password =, api_key, and secret. Are any hardcoded?
  • Is .env in .gitignore? Run git log -- .env to confirm it has never been committed.
  • Do you have separate credentials for development and production?
  • Do your keys have minimum necessary permissions?
  • Do you know how to rotate each secret, and have you documented where each is used?

Hardcoded secrets in a public repository are found by automated scanners within minutes. A leaked production key means unexpected charges, deleted data, or a compromised third-party account.

What good looks like: no credentials in source code. .env files are gitignored and have never appeared in git history. Production and development use separate, scoped credentials. Rotation is documented.


4. Input validation and server-side trust

AI tools add validation in the form. They often skip it in the server-side route or action that processes the form, because in the happy path the form is the only thing that calls the endpoint.

Everyone else uses curl.

  • Do your API routes and server actions validate input independently of what the client sent?
  • Is there a file upload path? Are file type, size, and content validated server-side?
  • Do you accept webhooks? Are you verifying signatures before processing the payload?
  • Can a user submit unexpected values — negative numbers, very long strings, wrong types — that pass through unchecked?
  • Is user-submitted content ever rendered as HTML without sanitization?

Client-side validation is UX. Server-side validation is security. Any endpoint that skips server-side checks can be called directly, bypassing everything the form enforces.

What good looks like: every route and server action validates its own inputs. Webhook signatures are verified before the payload is touched. User content is sanitized before storage or rendering.


5. Deployment readiness

AI tools help you build the app. They do not help you ship it reliably. Deployment readiness is usually the last thing anyone addresses and the first thing that causes an outage.

  • Can you reproduce a working build from a clean checkout, with no local files that are not in source control?
  • Are your production environment variables documented and confirmed to be set?
  • Do you have a migration strategy for schema changes? What happens to existing data?
  • If a deployment fails, what is the rollback plan, and have you tested it?
  • Is there a named person responsible for production, or does everyone own it?

Apps built entirely with AI tools often have implicit dependencies: a local file, a hardcoded path, a flag that only works in dev. These break the first time the app runs somewhere other than the developer's machine.

What good looks like: a clean build from the repository completes successfully. Migrations are scripted and tested. There is a rollback plan and a named person responsible for production.


6. Monitoring and observability

If the app breaks in production and no one is watching, it breaks silently. Most AI-built apps launch with no error tracking, no structured logs, and no uptime monitoring. The first signal that something is wrong is a user complaint.

  • Are errors captured somewhere you can actually see them — not just console logs, but a real error tracking tool?
  • Do you have structured logs for meaningful events: signups, payments, background jobs?
  • Is there an uptime check that alerts you when the app goes down?
  • Do you know what normal traffic and error rates look like? Will you notice when something is wrong?
  • Are your analytics capturing the events that matter for the business?

Without observability, you are flying blind. By the time a user reports an error, it has usually been happening for hours.

What good looks like: errors surface in a tool you check. An uptime monitor alerts when the app is unreachable. You have a baseline for normal, so abnormal is detectable.


7. Error handling and recovery

AI tools handle the success case well. They rarely handle failures. In production, failures are constant: network timeouts, third-party API errors, database blips, payment failures. None of these appeared in the happy path the AI was optimizing for.

  • What does a user see when a third-party API call fails — a useful message or a broken screen?
  • Do background jobs retry on failure? How many times, and what happens when they exhaust retries?
  • If a payment fails mid-transaction, is the data left in a consistent state?
  • Do webhook handlers return the right status codes so the sender knows whether to retry?
  • What happens when the database connection drops temporarily?

The more third-party integrations an app has, the more failure modes it has. AI tools generate the integration code. They do not generate the recovery code.

What good looks like: every external call has error handling. Background jobs retry with backoff. Failed operations leave data consistent. Users see something useful when things go wrong.


8. Dependencies and generated code ownership

AI tools pull in packages to solve problems. Usually the right ones. Sometimes abandoned, vulnerable, or unnecessary ones. Once that code is in your repository, it is yours to own.

  • When did you last run npm audit or equivalent? Are there known vulnerabilities?
  • Is a lockfile committed and used consistently in production?
  • Are there packages in your dependency list you cannot explain the purpose of?
  • Is there generated code in your codebase you have never read? Could you explain what it does?
  • How far behind is your framework or runtime from the current supported version?

Unreviewed dependencies and unread generated code are liability. If a vulnerability appears in a package you did not know you had, you cannot respond quickly. If generated code does something unexpected, no one will catch it because no one reviewed it.

What good looks like: dependencies are understood and tracked. All generated code has been read by a human who can explain what it does.


9. Tests and verification

AI tools generate tests. They generate tests for the code they built, not necessarily for the behavior that matters. An app with a high test count can still have no coverage of its critical path.

  • Does your test suite cover the authentication and login flow end to end?
  • Are there tests that verify one user cannot access another user's data?
  • Is the payment or core conversion path tested?
  • Do your tests run against something close to a real database, or only mocks?
  • Have you walked through the critical path manually in a production-like environment before launch?

The most important flows in the app — the ones users need to work to get value — are often the last ones with real test coverage, because the AI tested what it built most recently.

What good looks like: critical paths have tests. Auth and data-boundary behaviors are verified. There is at least one environment where you can test against real infrastructure before shipping.


10. Maintainability

AI-built apps accumulate technical debt quickly because each session generates code without full context of what already exists. The result is duplicated logic, inconsistent naming, and a codebase that is hard to change safely.

  • Is similar logic centralised, or implemented in multiple places?
  • Can you identify who is responsible for each part of the codebase?
  • Is configuration managed centrally, or scattered and hardcoded across files?
  • If you needed to change how authentication works, do you know which files to touch?
  • Could a new developer navigate the project without you explaining it first?

An app you cannot change safely is fragile. If fixing a bug in one place breaks something else because the logic is duplicated, you stop shipping confidently. Maintainability determines whether you can move fast in three months, not just today.

What good looks like: logic lives in one place. Configuration is centralised. A new developer can find their way around without a tour.


How to read the results

Count the questions you cannot answer confidently.

0–2: You are in reasonable shape. Investigate the specific gaps before launch.

3–5: There are real risks worth addressing before you have live users and live data at stake.

6+: The categories you cannot answer are usually connected — weak auth tends to come with weak data boundaries, missing validation tends to come with missing error handling. Consider a structured review before shipping.

This checklist tells you where to look. It does not tell you what is actually in your codebase. That requires a code review.


The free production risk audit

If you have found gaps, or you want a second opinion before the app handles real users and real money, the free AnchorStack production risk audit is the next step.

I review your codebase across the same ten categories and return a prioritised report of what is likely to break first and what to fix before it does. It is not a sales process. You get the findings whether or not you want to continue working together.

Get a free production risk audit →