Every organization that has deployed a machine learning model, a generative AI assistant, or an embedded AI feature inside a SaaS platform now carries a question its internal audit function was not originally built to answer: who actually checks that the algorithm is doing what it is supposed to do? Traditional IT General Controls testing was designed for deterministic software — code that, given the same input, always produces the same output. AI systems do not behave that way. The same support ticket, loan application, or resume can receive a different outcome depending on model version, prompt configuration, or quiet drift in production data. AI Audit Assurance is the emerging discipline that closes this gap, extending internal audit and SOX/ITGC rigor to cover the parts of an AI system that access reviews and change tickets were never designed to test.
This article lays out the methodology, the regulatory context driving it, and three short field examples. At the end, we have packaged the full step-by-step playbook — with worked examples, a ready-to-use AI control matrix, and a fieldwork checklist — into a companion PDF you can download.
A financial auditor evaluating a complex accounting estimate does not stop at confirming the spreadsheet is access-controlled — they independently test the inputs, the methodology, and the output. AI systems deserve the same treatment. A credit-scoring model, a resume-ranking tool, or a generative AI support agent is, in audit terms, a complex estimate that happens to run continuously rather than once a quarter. Testing only the access controls wrapped around it — and never the substance of what it decides — leaves the highest-risk part of the system completely untested.
No single regulation owns this space yet, but four frameworks are converging on a common set of expectations that internal audit teams are already being asked to demonstrate against:
The methodology sequences naturally from "what AI exists" through "is it still behaving as expected." Each step produces evidence that feeds the next.
You cannot audit what has not been catalogued. The inventory step builds a register of every AI and machine learning system in scope — including embedded AI quietly enabled inside third-party SaaS tools (sometimes called "shadow AI") that the business may not even think of as AI. Each system is then assigned a risk tier based on the severity of harm if it fails and how much autonomy it has, which directly determines how deep the remaining steps need to go.
This step tests whether the paperwork that should exist actually exists and reflects reality: a model or system card, training data lineage, a pre-deployment approval record, a named accountable owner, and an incident log. Documentation produced reactively once an audit is announced — rather than maintained throughout the system's life — is itself evidence that the underlying control is not really operating.
Here is where AI audit assurance most clearly diverges from traditional ITGC. The auditor independently tests training data provenance, checks for data drift between training and production, and — for generative AI — runs adversarial prompts to confirm the system stays within its documented authority. Traditional ITGC domains still apply, but need an AI-specific extension: access to model weights and prompt configuration, version-controlled retrains with a mandatory re-validation gate, and a new domain entirely — human override — testing that a reviewer can and does meaningfully intervene before harm occurs, not merely that an override button exists.
For any system that touches individuals, outcome distributions are independently measured across relevant groups using metrics like disparate impact ratio and equalized error rates — tracked on a recurring cadence, not validated once at launch and forgotten.
A point-in-time audit answers whether a model was acceptable on the day it was tested. Continuous monitoring answers the more important question: is it still acceptable today. That means automated drift dashboards, a re-certification trigger whenever the model is retrained or its prompt logic changes, and an audit-ready evidence repository producible on demand.
The full playbook below walks through each of these in depth, with the actual audit steps performed and the finding that resulted. In short:
Organizations adding AI to ITGC scope for the first time do not need to test everything at once. A practical first month looks like: build the inventory and interview process owners directly rather than relying solely on a SaaS asset register; assign a risk tier to every system found; and pick the single highest-tier system for a full Step 3–7 walkthrough as a proof of concept before scaling the program across the rest of the inventory.
16 pages: the full 7-step methodology, three worked examples (credit scoring, resume screening, generative AI), a ready-to-use AI control matrix, and a fieldwork checklist. Subscribe below for instant access — we will never spam you.