Technical deep dive10 min read

Securing the AI supply chain: from model to inference

A technical examination of the AI supply chain's unique attack surfaces, from model provenance and prompt injection to data exfiltration, and the case for infrastructure-level governance as a primary line of defence.

Every software team now understands supply chain risk. The lesson was expensive -- SolarWinds, Log4Shell, and the steady drumbeat of compromised npm packages taught the industry that trusting upstream dependencies without verification is a liability. Software bills of materials, signed artefacts, and provenance attestations are now standard practice for mature engineering organisations.

AI systems inherit all of those risks and then introduce an entirely new set of their own. A machine learning model is not a library you can audit line by line. Training data cannot be pinned to a hash. Inference requests carry sensitive context through third-party providers whose internal controls are opaque. The attack surface is broader, less understood, and largely ungoverned.

This article maps the AI supply chain, examines its distinct threat vectors, and argues that infrastructure-level governance -- not perimeter security or bolted-on tooling -- is the only defensible approach.

Anatomy of the AI supply chain

Software supply chains are well-characterised: source code, dependencies, build systems, artefact registries, deployment pipelines. The AI supply chain shares some of these components but extends them considerably:

  • Model provenance: the origin of a model, including who trained it, on what data, with which hyperparameters, and whether the published weights match the claimed training run.
  • Training data: datasets sourced from public corpora, licensed data providers, internal data lakes, or synthetic generation pipelines. Each source carries its own integrity and licensing risks.
  • Fine-tuning and adaptation layers: retrieval-augmented generation (RAG) pipelines, vector databases, system prompts, and adapter weights that customise a base model's behaviour.
  • Inference infrastructure: the compute layer where models run, whether self-hosted, cloud-managed, or accessed through third-party API providers.
  • Agent frameworks and tool chains: orchestration layers that connect models to external tools, APIs, databases, and other models -- each integration introducing a new trust boundary.
  • API providers and routing: the commercial providers through which inference requests flow, each with their own data retention, logging, and access policies.

Every link in this chain is a potential point of compromise. Unlike traditional software dependencies, many of these components resist the verification techniques that software supply chain security relies on.

Threat vectors unique to AI systems

Model provenance and integrity

When an organisation downloads a model from a public hub, it is placing trust in a chain of custody that is rarely verifiable. Model weights are opaque numerical tensors. There is no equivalent of reading the source code.

Model poisoning attacks exploit this opacity. An attacker who gains access to training infrastructure, or who publishes a subtly modified model to a public registry, can embed backdoor behaviours that activate only on specific trigger inputs. These statistical triggers are nearly invisible to static analysis, undetectable by traditional code review, and survive standard evaluation benchmarks with no measurable performance degradation on clean inputs.

The OWASP Top 10 for LLM Applications (2025) lists supply chain vulnerabilities as a critical risk. The OpenSSF Model Signing (OMS) specification, released in 2025, represents early progress -- enabling cryptographic verification of model integrity using detached signatures that cover weights, configuration files, tokenisers, and datasets as a single verifiable unit. But adoption remains nascent, and most model distribution channels do not enforce signing.

The practical consequence: organisations routinely deploy models whose provenance is unverified and whose behaviour under adversarial conditions is unknown.

Prompt injection

Prompt injection is the injection attack of the AI era, and it occupies the number one position in the OWASP Top 10 for LLM Applications (2025) for good reason.

Direct prompt injection involves crafting inputs that override a model's system instructions -- causing it to ignore safety guidelines, reveal its system prompt, or produce outputs that serve the attacker's purposes.

Indirect prompt injection is more insidious. The adversarial payload is embedded in content the model processes as part of its context -- a document retrieved by a RAG pipeline, an email being summarised, a web page being analysed. The model cannot reliably distinguish between data it should process and instructions it should follow. This is not a misconfiguration; it is a structural limitation of current transformer architectures.

Real-world demonstrations have shown indirect prompt injection enabling data exfiltration from enterprise systems, privilege escalation through tool-calling interfaces, and zero-click remote code execution in development environments. Research evaluating eight distinct defence mechanisms found that adaptive attacks could bypass all of them, consistently achieving success rates above 50%.

For any system that processes external content -- which includes most production AI deployments -- prompt injection is not a theoretical risk but an active, largely unmitigated threat.

Data exfiltration and information leakage

Large language models memorise fragments of their training data. This is not a bug; it is a consequence of how these models learn statistical patterns. When that training data includes personally identifiable information, proprietary source code, API keys, or internal documents, the model becomes a potential exfiltration vector.

Fine-tuning amplifies this risk. Organisations that fine-tune models on internal data create systems that have demonstrably memorised sensitive information and may reproduce it in response to carefully constructed queries.

Beyond training data leakage, the context window itself is an attack surface. Every inference request packages context -- conversation history, retrieved documents, system prompts -- and sends it through the inference pipeline. If that pipeline crosses organisational boundaries, through a third-party API provider, for example, that context is exposed to the provider's infrastructure and data handling practices.

Agent impersonation and tool abuse

Agentic AI systems -- those that call external tools, execute code, query databases, or invoke APIs -- introduce risks with no direct analogue in traditional software supply chains.

When an agent is compromised through prompt injection or operates with excessive permissions, it acts with the authority granted to it by the system. A compromised agent can exfiltrate data through legitimate API calls, modify records, or invoke other agents -- all while appearing to operate normally.

The trust model for agentic systems differs fundamentally from traditional service-to-service authentication. An agent's behaviour is determined at inference time by its system prompt, the model's weights, and its current context. Any of these can be manipulated, and the resulting behaviour is non-deterministic.

Provider trust and data sovereignty

Most organisations access AI capabilities through third-party inference providers. Each API call transmits potentially sensitive data -- prompts, context, documents -- to infrastructure outside the organisation's control.

The visibility problem is acute. Organisations cannot verify how their data is handled after it reaches a provider's endpoint. Data retention policies vary. Logging practices differ. Jurisdictional boundaries may be crossed without the caller's knowledge, creating compliance exposure under regulations like GDPR.

When organisations use multiple providers -- increasingly common for cost optimisation, latency, or capability reasons -- the governance challenge multiplies. Each provider represents a distinct trust boundary, and routing decisions that consider only cost and latency ignore the most consequential variable: the sensitivity of the data being transmitted.

Why perimeter security is insufficient

Traditional network security operates on a clear model: define a perimeter, control what crosses it, inspect traffic at the boundary. This model breaks down for AI workloads for several reasons.

Every API call is a potential vector. Unlike traditional web requests where the payload format is constrained, AI inference requests carry arbitrary natural language that can encode adversarial instructions. There is no firewall rule that can distinguish a legitimate prompt from a prompt injection attack without understanding the semantic content.

The threat is in the content, not the protocol. Network-level inspection can identify malicious payloads in HTTP traffic because those payloads have structural signatures -- SQL injection has syntax, XSS has script tags. Prompt injection has no fixed syntax. The same words that constitute a legitimate request in one context constitute an attack in another.

Trust boundaries are dynamic. In agentic systems, the set of external services an AI system interacts with can change at inference time based on the model's reasoning. Static network policies cannot govern dynamic tool invocation.

Data classification must happen per request. The same AI system may handle public information in one request and highly confidential data in the next. Perimeter controls cannot make routing and access decisions at this granularity.

The implication is clear: AI security requires enforcement at the request level, not the network level. Every inference request must be evaluated against policy, routed appropriately, and logged for audit -- at wire speed, without introducing latency that degrades the system's utility.

Infrastructure-level governance as a primary defence

If perimeter security is insufficient, the alternative is governance infrastructure that operates inline with every AI interaction -- policy enforcement in the request path, evaluating every inference call against organisational rules before it reaches a provider.

This is the approach that Rai Shield takes -- acting as a governance layer that intercepts AI traffic and enforces policy at wire speed. But the architectural pattern matters more than any specific implementation. The key properties of an effective AI governance layer are:

Per-request policy enforcement. Every inference request is evaluated against a policy engine that considers the content's sensitivity, the destination provider's trust level, applicable regulatory requirements, and organisational rules. Decisions are made in milliseconds, inline with the request flow.

Content-aware routing. Requests containing sensitive data are automatically routed to providers that meet specific data handling requirements -- on-premises inference for highly confidential content, trusted cloud providers for general workloads. Routing decisions are policy-driven, not manually configured per integration.

Complete audit trails. Every request, every routing decision, every policy evaluation is logged with sufficient detail for compliance reporting and incident investigation. When a regulator or an internal audit function asks "where did this data go?", the answer must be immediate and precise.

Data sovereignty controls. For organisations operating across jurisdictions, the governance layer enforces geographical constraints on data flow. A request originating in the EU that contains personal data is routed only to inference infrastructure within approved jurisdictions, regardless of which provider offers the lowest latency.

Provider abstraction. By sitting between applications and inference providers, the governance layer decouples applications from specific providers. This enables rapid provider rotation in response to incidents and reduces the blast radius of any single provider compromise.

Rai Shield implements these properties using Rust and WebAssembly, deployed from edge locations to bare metal, precisely because governance enforcement must not become a performance bottleneck. Policy evaluation that adds hundreds of milliseconds to every request will be bypassed by engineering teams under latency pressure. Governance infrastructure must be fast enough to be invisible.

Practical recommendations

The following steps represent a reasonable starting point for organisations that take AI supply chain security seriously.

Establish model provenance verification. Before deploying any model, verify its integrity. Adopt the OpenSSF Model Signing specification where supported. For models sourced from public hubs, maintain an internal registry of approved models with verified checksums. Treat unverified models with the same suspicion you would treat unsigned software packages.

Implement request-level policy enforcement. Deploy governance infrastructure that evaluates every inference request against organisational policy. This is not optional monitoring -- it is inline enforcement. Define policies for data sensitivity classification, provider routing, content filtering, and rate limiting.

Classify data before it reaches the model. Identify sensitive content in prompts and context windows before they are transmitted to inference providers. PII detection, confidentiality classification, and regulatory tagging should happen at the governance layer, not as an afterthought.

Apply least-privilege to agentic systems. Grant agents the minimum tool access required for their function. Implement per-tool authorisation that is evaluated at request time, not configured statically. Log all tool invocations with full context for post-hoc review.

Maintain comprehensive audit trails. Log every inference request, including the full request context, routing decision, policy evaluation result, and provider response metadata. Store audit data in tamper-evident storage with retention periods that meet your regulatory requirements.

Conduct adversarial testing regularly. Red-team AI systems specifically for prompt injection, data exfiltration, and agent abuse. Standard penetration testing methodologies do not cover AI-specific attack vectors. Build or acquire AI-specific adversarial testing capabilities.

Govern multi-provider deployments explicitly. Maintain a provider registry that records each provider's data handling commitments, jurisdictional presence, and compliance certifications. Automate routing decisions based on this registry, and review it quarterly.

Plan for provider compromise. Develop incident response procedures specific to AI provider compromise: immediate traffic rerouting, audit log review for the exposure window, and communication procedures. The governance layer should support rapid provider isolation without application changes.

What comes next

AI supply chain security is where software supply chain security was a decade ago: risks well-documented by researchers, occasionally demonstrated in public incidents, but not yet systematically addressed by most organisations. The difference is that AI systems are being deployed far more rapidly than traditional software supply chains matured.

The organisations that will navigate this well are those that treat AI governance as infrastructure -- not a compliance checkbox, not a layer of manual review, but an automated, high-performance system that operates on every request, enforces policy consistently, and maintains the audit trails that regulators and incident responders will inevitably require.

The supply chain is only as strong as its weakest link. In AI systems, those links are numerous, novel, and not yet well-defended. The time to address them is before the inevitable breach that makes this obvious to everyone else.