Third-party risk management for AI integrations: the diligence most programmes are not running

Most third-party risk management programmes I review were built to answer a specific set of questions: is the vendor financially stable, is their security posture adequate, do they process personal data correctly, will they notify us of breaches, can we exit the relationship cleanly. The questionnaires, the diligence workflows, the ongoing monitoring cadence — all of it was designed for a world in which vendors provided services that processed data and executed business logic. That world still exists and those questions still matter. But a new class of vendor risk has entered the supply chain, and most TPRM programmes have not been updated to address it.

When you integrate a third-party AI model, API, or agent into your product, you are inheriting risk surfaces that do not appear in any traditional security questionnaire. The vendor is not just processing your data — they are producing outputs on which your product’s behaviour, your users’ experience, and increasingly your regulatory posture depend. Managing that risk requires specific diligence that most programmes are not running. This article is about what that diligence looks like.

Why classic TPRM misses AI risk

A standard vendor questionnaire asks about ISO 27001 certification, SOC 2 attestation, GDPR compliance, data residency, breach notification timelines, sub-processor lists, and contractual safeguards. Each of those is answered by the vendor’s security and privacy teams against a reasonably stable set of facts. If the answers are satisfactory, the vendor clears diligence and enters the register with a tier.

AI integrations do not yield clean answers to those questions. The security questionnaire asks about breach history — but AI model risk is rarely a “breach” in the SOC 2 sense. The privacy questionnaire asks about data processing — but the vendor’s model may have been trained on data you would not consent to, that affects the outputs you receive today. The SLA asks about uptime — but silent degradation of model quality is invisible to uptime monitoring. Classic TPRM simply does not ask the questions that expose AI-specific risk.

Five risk surfaces matter for AI vendors that traditional TPRM does not address. I will walk through each and describe the diligence that actually works.

Risk surface one — training data provenance

Every model is a compressed representation of the data it was trained on. If that data includes copyrighted material without licence, personal data processed without a lawful basis, or data that produces biased outputs in your use case, those issues propagate through every output the vendor’s model produces for you.

Diligence that works: ask the vendor for a data provenance statement. Specifically: what categories of data were used for training, what sources were used, what filtering and curation was applied, what licensing arrangements cover the training corpus, and what representations the vendor makes about training data legality in your jurisdiction. The serious providers can answer this. The ones that cannot are either early-stage, or have genuinely not done the work, or are hoping you will not ask. None of those is a reason for you to carry the risk downstream.

For EU deployments, note that Article 53 of the AI Act now requires GPAI providers to publish a summary of training data and comply with Union copyright law in training. If a vendor is not providing this for a GPAI model you integrate, it is both a regulatory issue for them and a diligence failure point for you.

Risk surface two — model behaviour and evaluation transparency

Classic SaaS diligence can verify that a system does what it says through reference calls, demos, and technical documentation. Model behaviour is harder to verify. A model can appear to work well in your tests and fail in production on edge cases that matter. It can work well today and degrade after an update. It can produce outputs that are plausible but wrong, and the plausibility is itself the problem.

Diligence that works: ask for the vendor’s evaluation framework. Which benchmarks does the model perform against, what is the performance on each, what evaluations are run on updates, how is regression detected and communicated. For closed-source APIs, this is often packaged in a system card or model card — use it as the starting point, not the destination. For models that sit in high-stakes workflows, add your own evaluation harness and run it against the vendor’s model on a cadence. Treat model behaviour as an ongoing diligence activity, not a one-time assessment.

Risk surface three — change management and model update cadence

SaaS vendors update their software, and classic TPRM handles this through change management clauses and version-control guarantees. AI vendors update their models, and the update can meaningfully change the model’s behaviour without any obvious versioning signal. The API version string remains the same. The outputs shift.

Diligence that works: ask about update cadence, update notification policy, ability to pin to a specific model version, and deprecation timelines. Several major providers now offer dated model versions — use them in production for regulated workflows and lift to newer models only after you have evaluated them. This is not a theoretical concern. I have seen SaMD teams discover that a clinical decision pipeline’s outputs had shifted because the upstream vendor updated the model; the downstream product now had to re-evaluate clinical performance to defend the submission.

Contractually, model update notification with a minimum notice period and the right to roll back should be non-negotiable for any AI vendor embedded in a regulated workflow.

Risk surface four — the vendor’s own governance posture

When you integrate an AI vendor into a high-risk workflow, your conformity obligations under the EU AI Act, your GDPR obligations around automated decision-making, and your contractual obligations to your own customers all assume that the vendor has an appropriate governance posture. If the vendor does not have a management system for the AI systems they operate, does not assess their own risks, does not have incident response for model misbehaviour — you are inheriting governance debt that shows up in your own audit.

Diligence that works: ask whether the vendor operates an ISO 42001 management system or equivalent. Ask for their AI risk assessment methodology. Ask about incident response for model-specific incidents — what counts as an incident, how it is classified, how it is reported to customers, what the remediation process looks like. Ask whether they have an AI ethics or safety review function and what decisions it has made. The quality of the answers maps directly onto the risk you are inheriting.

Risk surface five — contractual allocation of AI Act responsibility

Under Article 25 of the EU AI Act, a downstream provider that substantially modifies an AI system, or deploys a GPAI model for a high-risk use case, becomes the provider of a high-risk AI system and carries the full Article 16 obligations. The contract with the vendor does not override that regulation — Article 25(1)(a) is explicit. But the contract can allocate information-provision obligations that make your conformity work feasible or infeasible.

Diligence that works: specific contractual clauses on the vendor’s obligation to provide technical documentation that supports your Annex IV file; assistance with conformity assessment where applicable; cooperation on post-market monitoring and serious-incident reporting; obligations to notify you of regulatory actions affecting the model; rights to information about evaluations and known limitations. These are not standard master services agreement terms. They need to be negotiated deliberately for AI integrations that land you in regulated territory.

Operationalising the diligence

A TPRM programme that handles AI risk well does three things differently from a classic one.

First, the vendor tiering acknowledges AI-specific risk. An AI vendor embedded in a high-stakes product workflow is not a tier-three SaaS. It is a tier-one vendor regardless of data volume, because a failure of the model affects product behaviour directly. Tier the vendor to the consequence, not the data footprint.

Second, the diligence workflow includes an AI-specific module. The standard ISO 27001 + GDPR questionnaire is the baseline. On top of that, AI vendors receive a supplementary diligence pack covering training data, evaluation, update cadence, governance posture, and AI Act responsibility allocation. This is not optional for vendors in scope.

Third, the ongoing monitoring cadence includes model-behaviour monitoring. You run your own evaluations against the vendor’s current model on a regular cadence. You maintain a pinned version in regulated workflows. You have a documented decision process for adopting model updates that includes regression testing.

None of this is experimental practice. It is the operational consequence of the regulations that are now binding. The TPRM programmes that are not updating for AI risk are running on the assumption that the vendor will carry the load. That assumption is wrong in most of the cases I see — and the gap is a supply-chain risk that sits quietly in the register until a procurement review, an audit, or a regulatory inspection surfaces it.

The good news is that the diligence work is not conceptually hard. The gap is usually one that a focused two-to-three-week effort can close — if the question is taken seriously early rather than discovered late.