The governance gap in agentic AI

Most AI governance today was designed for a world that is already passing. A person types a prompt, an API call goes out, a response comes back. Rate limits, content filters, usage quotas -- these controls assume a human is initiating each request, that interactions are discrete and observable, and that a single provider handles each exchange from start to finish.

That world is giving way to something fundamentally different. AI agents -- systems that chain tool calls, make routing decisions, and act autonomously on behalf of organisations -- are moving from research prototypes into production workloads. And the governance tooling that organisations rely on has not kept pace with what these agents actually do.

This is not a matter of scaling existing controls. It is a qualitative shift in how AI systems interact with infrastructure, data, and each other. Addressing it requires rethinking where governance happens, what it observes, and how it enforces policy.

From request-response to autonomous workflows

The first generation of AI integration was straightforward. An application sends a prompt to a large language model API. The model returns a completion. The application displays it. Governance in this model is relatively simple: you control who can call the API, what they can send, and how much they can use.

Agentic AI changes every one of those assumptions. An agent does not make a single API call. It receives a high-level objective -- "research these three vendors and draft a procurement comparison" -- and decomposes it into a chain of actions. It might query a language model for an initial plan, call a web search tool, retrieve documents from an internal knowledge base, invoke a different model for summarisation, write intermediate results to a shared workspace, and then call yet another model to compose the final output.

Each step involves decisions. Which model to use for which subtask. Whether to retrieve additional context. When to escalate or ask for clarification. Whether to invoke external tools or delegate to another agent. These are routing and access decisions that, in a non-agentic system, would be made by engineers at design time. In an agentic system, they happen at runtime, autonomously, and at machine speed.

The implication for governance is stark. A single high-level request can fan out into dozens of API calls across multiple providers, touching multiple data stores, crossing network and jurisdictional boundaries -- all without any human reviewing the intermediate steps. The governance tooling that watches individual API calls sees the leaves of this tree, not the tree itself.

Where traditional governance breaks down

Consider a concrete scenario. An organisation deploys an agent to assist its legal team with contract review. The agent receives a contract document, extracts key clauses, compares them against the organisation's standard terms, identifies risks, and drafts a summary with recommendations.

To do this, the agent might:

Send the contract text to a language model for clause extraction.
Query a vector database containing the organisation's standard terms.
Route specific clause comparisons to a specialised model that handles legal reasoning.
Call a summarisation model to produce the final risk assessment.
Write the output to a document management system.

Now apply traditional governance controls to this workflow. API rate limiting sees five separate API calls -- it has no concept that they form a single workflow with a single purpose. Content filtering at each API boundary checks whether individual requests contain prohibited content, but it cannot evaluate whether the overall workflow is appropriate. Usage quotas track token consumption per provider but cannot answer the question, "how much did this contract review cost across all providers?"

More critically, none of these controls can enforce a policy like, "contract documents classified as confidential must only be processed by models deployed within the EU." The agent makes routing decisions at runtime. By the time a content filter at the API boundary sees the request, the routing decision has already been made. The governance check arrives too late to prevent the violation -- it can only detect it after the fact, if it detects it at all.

This is the governance gap: the space between what agents actually do and what existing governance tools can observe and control.

The visibility problem

Visibility is the foundation of governance. You cannot enforce policies over actions you cannot see. And agentic workflows are, by design, opaque to tools that observe at the level of individual API calls.

When an agent chains 50 API calls across three providers to fulfil a single task, the full picture of what happened -- the decision tree, the data that flowed between steps, the routing choices, the tools that were invoked -- exists nowhere. Each provider sees its own requests. The orchestrating application might log high-level task completion. But the connective tissue between these observations is missing.

This creates several concrete problems.

Audit trails are fragmented. If a regulator asks, "show me every system that processed this individual's personal data," the answer requires reconstructing a workflow that spans multiple providers, tools, and data stores. With current tooling, this reconstruction is manual, error-prone, and often incomplete.

Anomaly detection is blind to workflows. A single API call that sends a large document to a language model looks identical whether it is part of an authorised contract review workflow or an unauthorised data exfiltration. Without workflow-level context, security monitoring generates either too many false positives (flagging every large request) or too many false negatives (missing genuinely problematic workflows).

Cost attribution is approximate at best. When agents make autonomous routing decisions -- choosing between providers based on latency, capability, or cost -- tracking the actual cost of a business operation requires correlating calls across providers. Most organisations cannot do this today.

The problem compounds as agents become more capable. Multi-agent systems, where agents delegate subtasks to other agents, add another layer of indirection. An orchestrating agent might invoke a research agent, which invokes a retrieval agent, which calls three different APIs. The chain of delegation makes visibility exponentially harder.

Data sovereignty in agent workflows

Data residency and sovereignty requirements add another dimension to the governance gap. Regulations like the GDPR, the UK Data Protection Act, and sector-specific rules impose constraints on where data can be processed and stored. These constraints are non-negotiable, and the penalties for violation are substantial.

In a traditional API integration, data residency is addressed at design time. Engineers select a provider and region, configure the integration, and the data flow is deterministic. Compliance can be verified through architecture review.

Agents break this model. An agent that dynamically routes requests based on model capability, latency, or availability might send a query containing personal data to whichever provider responds fastest. If that provider processes the request in a jurisdiction that violates the organisation's data residency requirements, the violation is invisible until -- or unless -- someone audits the routing logs.

This is not a hypothetical concern. As organisations adopt multi-provider strategies for resilience and cost optimisation, the number of potential routing paths grows. An agent orchestrating across providers in the US, EU, and Asia-Pacific must make jurisdictionally-aware routing decisions for every request that contains regulated data. This is a governance requirement that must be enforced at the infrastructure layer, in real time, before the request leaves the organisation's control.

Policy enforcement after the fact -- detecting violations in logs and remediating them -- is insufficient for data sovereignty. The data has already crossed the boundary. The processing has already occurred. The violation is complete.

What governance-first means for agents

Addressing the governance gap requires a shift in where and how policy enforcement happens. Rather than applying governance as a layer on top of completed actions, governance must be embedded at every interaction point in the agent workflow.

This means several things in practice.

Policy enforcement at every tool call. When an agent invokes a tool -- whether it is an API call to a language model, a database query, or a message to another agent -- governance policy must be evaluated before the call executes. This includes data classification checks, jurisdictional routing constraints, access control policies, and content policies. The enforcement point must sit in the execution path, not beside it.

Workflow-level observability. Governance tooling must correlate individual actions into coherent workflows. This requires propagating context -- a trace identifier, a classification level, an originating policy scope -- across every step of the agent's execution. The observability system must understand that 50 API calls constitute one contract review, not 50 independent requests.

Policy as infrastructure, not configuration. In agentic systems, governance policies are not static rules applied at deployment time. They are dynamic constraints evaluated at runtime against the current context: the data being processed, the provider being called, the jurisdiction in play, the classification level of the workflow. This demands governance infrastructure that can evaluate policies at machine speed, with latency low enough to sit in the critical path without degrading the agent's performance.

Governance across trust boundaries. When an agent delegates to another agent, or hands off context to a different provider, the governance context must travel with the request. The receiving system must know what policies apply, what classification the data carries, and what constraints govern its processing. Without this, every trust boundary becomes a governance gap.

This is the architectural challenge that motivated the design of systems like Rai Shield -- a governance layer that operates at wire speed, enforcing policy at every interaction point rather than monitoring after the fact. When agents make dozens of routing and tool-call decisions per second, the governance layer must be fast enough to keep pace without becoming a bottleneck, and comprehensive enough to see the full workflow, not just individual requests.

As we explore agent orchestration with Arai, these requirements become even more demanding. Coordinating multiple agents with full visibility and governance means building observability and policy enforcement into the orchestration fabric itself -- not bolting it on after the architecture is set.

The case for open governance infrastructure

There is one more dimension to this problem that deserves direct attention: the governance layer itself must be trustworthy.

When governance was a matter of rate limiting and content filtering, the stakes of a governance failure were bounded. A missed content filter might let through an inappropriate response. A missed rate limit might result in an unexpected bill. These are problems, but they are contained problems.

When agents are making autonomous decisions about sensitive data -- routing personal information across jurisdictions, accessing internal systems, delegating tasks to external providers -- the governance layer becomes a critical trust component. It sees everything. It makes enforcement decisions about every interaction. It holds the audit trail for every action the agent takes.

This creates what might be called a trust paradox for proprietary governance. If the system responsible for enforcing your governance policies is itself opaque -- if you cannot inspect how it makes enforcement decisions, cannot verify that it correctly implements your policies, cannot audit its own behaviour -- then you have not solved the governance problem. You have moved it.

Open governance infrastructure resolves this. When the governance layer is open source, organisations can audit the enforcement logic, verify that policies are correctly implemented, inspect the decision path for any given enforcement action, and extend the system to meet requirements that the original authors did not anticipate.

This is not an abstract preference. Regulatory frameworks increasingly require organisations to demonstrate how their AI governance works, not merely that it exists. The ability to point to auditable, inspectable governance infrastructure -- and to show exactly how a policy was enforced for a specific interaction -- is becoming a compliance requirement in its own right.

Open governance infrastructure also enables the kind of collaborative development that this problem demands. No single organisation has encountered every governance scenario that agentic AI will create. An open platform allows the governance community -- engineers, compliance teams, regulators, and researchers -- to contribute policies, share patterns, and build on each other's work.

The road ahead

The shift to agentic AI is not waiting for governance to catch up. Organisations are deploying agents into production today, often with governance tooling designed for a simpler era. The gap between what agents do and what governance tools can observe and control is real, and it is growing.

Closing this gap requires treating governance as a first-class infrastructure concern, not an afterthought. It means building governance into the agent execution path, not wrapping it around the outside. It means investing in workflow-level observability, not just request-level logging. It means designing governance systems that are fast enough for machine-speed decision-making and open enough for genuine auditability.

The organisations that get this right will be the ones that can deploy agentic AI with confidence -- not because they trust the agent to do the right thing, but because their infrastructure ensures it.