MCP Fleet — AI Warden

The problem

MCP makes agents more capable than your last security review assumed.

A single MCP server can expose your production database, your customer ticketing system, or your payment APIs to a model in three lines of config. Most teams find out who connected what only after something breaks.

Unscoped tools

An agent has more privilege than the analyst who launched it

An MCP server connected to postgres://prod with read+write makes every connected agent a database admin. Without a warden, scope is “whatever the connection string allows”.

ControlPer-agent allow-list at the MCP method level. Default deny. Auto-deny on first-seen tools.

SQL & command injection

The model is now writing your queries

Models will compose DROP TABLE, rm -rf, and shell pipes when prompted to “clean up” or “reset”. Your database driver doesn’t care that an LLM wrote it.

ControlInline SQLi / shell-injection scanner. 403 the call, log the prompt, alert the owner.

Data exfiltration

Tools become a side-channel out of your VPC

Any MCP server with outbound HTTP is a potential exfil pipe. A prompt-injected agent will paste your secrets into a Slack webhook if you let it.

ControlEgress allow-list per server. Outbound DLP scanner. Block on classified content.

No audit trail

“Who did what, when?” has no answer

Stock MCP gives you stdio between client and server. No request ID. No principal. No retention. When something goes wrong, the trail is in someone’s terminal scrollback.

ControlSigned, immutable request log. Principal, agent, tool, args, result, latency, verdict.

Registry

Know what’s deployed.
Know who can reach it.

Every MCP server in the fleet is a first-class object: an owner, a tier, a data classification, a list of approved consumers. Auto-discover what’s already running. Block what isn’t in the registry.

Auto-discover MCP servers from Kubernetes, ECS, and developer machines
Per-server tier (sandbox / staging / production / regulated)
Data classification tags drive scanner activation and approval flows
Owner of record — mandatory; default-deny without one

Registry · 147 servers

`postgres-prod`	regulated	data-platform
`github-readonly`	sandbox	devex
`jira-rw`	production	customer-ops
`filesystem-tmp`	sandbox	research
`stripe-live`	regulated	payments
`slack-broadcast`	production	internal-comms
`unknown-fs-7a`	unregistered	auto-blocked

Hosting & sandboxing

Three ways an MCP server reaches the registry — all of them scanned.

A clean fleet doesn’t happen by policy alone. AI Warden gives you a path for the source you control, a path for the binaries someone else is running, and a path for the third-party endpoints you can’t touch. Every one ends with a server that has been stared at by a static analyser, a sandbox runner, or a live proxy — usually all three.

Path 1 · Self-hosted by AI Warden

Submit source → scan → sandbox → publish

Point us at a Git repo or container image. We pull it, run a static code scan against the source, build the image, then stand the server up inside a network-isolated sandbox runner. Live scanners watch the first traffic. Only then does it get a published URL on your fleet.

Static code scan — supply-chain risk, dangerous tool definitions, suspicious file/network patterns, secrets in source.
Sandbox build & run — isolated network, no outbound by default, ephemeral filesystem.
Live behavioural scan — synthetic prompts exercise every tool while the scanner panel watches.
Approver sign-off — four-eyes gate before the server is published to consumers.
Versioned deploys — every release re-runs the same gate; rollback is one click.

Path 2 · Self-run, registered with us

Register existing remote → proxy live → scan continuously

The server is already running in your VPC, your cluster, or a partner’s. Register the URL, owner, tier, and data classification. The supervisor proxies every call through the full scanner stack — you get the same enforcement and audit story without re-hosting.

Live proxy scanning — SQLi, prompt-injection, secrets, PII, anomaly applied to every request & response.
Method discovery — the registry learns the tool surface from real traffic and flags drift.
Egress allow-list — outbound network from the registered server constrained to declared dependencies.
Health probes — a server that stops answering is auto-quarantined; agents get a clean error.

Path 3 · Third-party MCP / SaaS

Vendor endpoint → vetted → proxied through us

You don’t own the server, but your agents need its tools. Add the vendor endpoint to the registry; we tag it third-party, raise the scanner panel to the regulated preset, and put it behind a stricter egress allow-list. Agents reach it the same way they reach everything else: through aiw-gw.

Vendor record — sub-processor, data residency, contract terms attached to the registry entry.
Higher scanner floor — PII redaction, secret-detection, output DLP all default-on.
Spend visibility — per-call attribution, so “why did the bill spike” has a one-page answer.
Kill switch — revoke at the registry, every agent loses access in the next request, no client redeploy.

Sandbox lifecycle customer-ops/jira-summariser-mcp published

Stage	Result	Detail
Source pull	ok	git@…jira-summariser-mcp.git · sha 2c4f1a8
Static code scan	0 critical	2 medium — broad `fs.read` permission, mitigated by tier rules
SBOM & supply chain	signed	142 deps · 0 known CVEs · in-toto attestation attached
Image build	ok	distroless · non-root · read-only rootfs
Sandbox run	ok	network-isolated · ephemeral fs · 38 synthetic prompts
Behavioural scan	1 flag	One tool returned secrets-shaped string — auto-redactor proven, then approved
Four-eyes review	approved	a.morgan · l.shah · 2026-05-07T16:41Z
Publish	live	aiw-gw/mcp/jira-summariser · tier=production

Supervisor

Inline. Native. No client patch.

The warden speaks MCP natively. Point the agent at aiw-gw instead of the underlying server, and every method call is intercepted, scanned, scoped, logged, and either forwarded or denied. The agent never knew the server moved.

HTTP & SSE transports — the centralised remote-MCP shape your fleet should use anyway
Sub-millisecond scanner overhead, p99 < 8 ms end-to-end
Hot-reload of policy and scanners without dropping a connection
Per-agent rate-limit, budget, and concurrency caps

# before
agent --> postgres-mcp
# anyone with the binary can reach prod

# after
agent --> aiw-gw --> postgres-mcp
         │
         ├── identity      sp-92ab (credit-risk-summariser)
         ├── scope         SELECT only, schema=public
         ├── scanners      sql-injection · pii · anomaly
         ├── budget        $1,420 / $3,000
         └── log           req_018f3a · signed
          

Scanner library

Every tool call passes a panel of scanners.

Scanners run inline on the request, the response, or both. Stack them by server tier — a sandbox filesystem server gets the basics; a production payments server gets the lot.

SQL & shell injection

Pattern + AST analysis on tool arguments. Blocks the canonical “OR 1=1”, UNION SELECT, backtick command substitution, and shell pipes.

Prompt injection & jailbreak

Detects override attempts in tool inputs (“ignore previous instructions”), zero-width unicode, and known jailbreak prompts. Blocks before the call leaves the agent.

Secrets & tokens

Regex + entropy on inputs and outputs. Catches API keys, JWTs, AWS access keys, and PEM blocks. Redacts inline; never persists raw.

PII & regulated data

Names, emails, national IDs, card numbers, health identifiers. Tag-driven — activate per-server based on data classification in the registry.

Behavioural anomaly

Per-agent baseline of method mix, query shapes, output size. Score every call. Auto-pause on ≥3σ deviation; alert the owner.

Custom rules

YAML or WASM plugins. Enforce internal taxonomies, reference your DLP service, or call out to a model judge for nuanced classification.

Request log

One signed event per call.
One source of truth for incidents.

Every MCP request is recorded in ClickHouse with sub-second query latency. Append-only, hash-chained, exportable to your SIEM. The same log that powers the dashboard is the log your auditor sees.

Principal · agent · tool · arguments · response · verdict · latency
Hash chain with periodic anchoring — any tampering is detectable
Sinks: Splunk · Sentinel · Elastic · S3 · Snowflake · Datadog
Retention configurable per data classification (default 7 years for regulated)

Live · last 60s

10:14:02 200 sp-92ab · postgres.query · 142 rows
10:14:02 200 sp-7c11 · jira.search · 18 rows
10:14:03 redact sp-92ab · postgres.query · 1 PII
10:14:04 403 sp-3c0d · postgres.query · sqli
10:14:05 200 sp-7c11 · github.search · 8 files
10:14:05 429 sp-d18a · stripe.charges · rate-limit
10:14:06 403 sp-d18a · fs.write · scope

Integrate

Four steps. Pilot in a day.

1

Submit or register the server

Two paths. Submit a Git repo — AI Warden runs static code scans, builds the image, and stands the server up in a sandbox for a live behavioural scan before it’s published. Register an existing remote MCP — we record the owner, tier, and data classification, and proxy every call through the supervisor with the same scanner stack. Either way, the warden refuses to forward to anything that hasn’t cleared the registry.
2

Mint a system principal for the agent

One principal per agent. Federated to your IdP. Scope it to the MCP methods it needs — nothing more.
3

Repoint the client

Change one URL in your agent config from the raw server to aiw-gw. Native MCP — no SDK swap.
4

Watch the request log

Tail the live stream. First 200s in seconds. First 403 inside an hour. Tune scope from real traffic.

Spec

Production fundamentals.

Protocol	Native MCP over HTTP & SSE — the centralised remote-MCP transport. No client SDK changes.
Identity	OIDC m2m · PAT · system principal federated to Keycloak / Okta / Entra ID / Auth0
Scope	Per-agent allow-list at the MCP method level; default deny
Scanners	SQLi · prompt-injection · secrets · PII · anomaly · custom YAML / WASM
Budget	Per-agent and per-server caps, with auto-pause on breach
Audit	ClickHouse · hash-chained · 7-year retention default · 5+ SIEM sinks
Latency	p50 ~3 ms · p99 < 8 ms scanner overhead
Deployment	Self-hosted (k8s, ECS, plain VM) or AI-Warden-hosted SaaS · single binary
HA	Stateless · scale horizontally · graceful drain on policy reload

From discovery to enforcement

Find every MCP server.
Govern every one.

A 30-minute working session: we point the warden at your environment, discover what’s deployed, and stand up policy on one server end-to-end. You walk away with a real request log against your own traffic.

Book a working session Read the agents brief

Every tool call,every agent, on the record.