before
# client.py — committed last week import openai openai.api_key = "sk-prod-7f2a-…" openai.chat.completions.create(...)
A real key, in clear text, in source. Rotation means hunting every repo, every CI runner, every laptop.
Platform · LLM Gateway
Drop the AI Warden gateway in front of OpenAI, Anthropic, Azure OpenAI, AWS Bedrock, Google Vertex, and your self-hosted models. Hold provider keys server-side. Enforce per-team budgets. Scan every request and response for prompt injection, secrets, and PII. Send OpenAI-compatible traffic from clients you've already shipped.
Why a gateway
A gateway turns a fleet of direct provider calls — uncountable, unbounded, untraceable — into one well-governed pipe. You set policy once, in one place. Every team gets the same guarantees. Every audit row has a real owner.
A `sk-…` key committed to git, pasted into a Slack channel, exfiltrated from a stale `.env` file, or echoed in a CI step ends up in a paste site within hours. The blast radius is the whole provider account.
An agent retries on every error. A batch job runs ten times because the queue lost ack. A dev forgets to cap `max_tokens`. The provider charges by the token, not the intent.
RAG pipelines, document summarisation, copilots — all read content the user did not write. Hidden instructions in that content can make a tool-using agent leak data, call dangerous tools, or escape its envelope.
Customer PII, API secrets, internal IDs, source code — all routinely pasted into prompts. The provider keeps a copy. Your DLP didn't see it because the egress looks like an ordinary HTTPS POST.
Cost control
The gateway sits in the path of every token. That makes cost a property of the platform, not a quarterly surprise. Set ceilings, cache smart, route cheap-when-cheap-is-fine, and turn the procurement bill into a dashboard.
Key custody
The single biggest cause of an LLM-cost incident is a leaked key, full stop. AI Warden treats provider credentials like database credentials: they live in one place, they are rotated routinely, and no human or client ever sees them.
# client.py — committed last week import openai openai.api_key = "sk-prod-7f2a-…" openai.chat.completions.create(...)
A real key, in clear text, in source. Rotation means hunting every repo, every CI runner, every laptop.
# client.py — same shape, no provider key import openai openai.base_url = "https://gw.aiwarden.io/v1" openai.api_key = os.environ["AIW_PAT"] openai.chat.completions.create(...)
A scoped, expiring PAT. Mint a new one in seconds. Provider keys never leave the platform.
Prompt firewall
Two scanner pipelines run on every gateway hop — one on the request,
one on the response. Configure each rule to flag,
redact, or block. The same engine runs on
MCP requests, so you write a rule once and apply it everywhere.
Detect known instruction-override patterns, role-confusion templates, and indirect-injection markers in tool inputs and RAG context.
Provider keys, AWS access keys, GitHub PATs, GCP service-account JSON, JWTs, and 70+ other patterns. Redact before they leave your network.
Names, emails, phone numbers, national IDs, payment card numbers. Locale-aware (UK, EU, US, India, Singapore). Redact in-flight or block by classification.
Volume, latency, model-mix, and tool-call shape baselines per principal. Page when an agent strays from its envelope.
Bring your own regex, your own classifier, or your own LLM-judge rule. Hot-reload without restarting the gateway. Per-rule action and per-rule blast-radius preview before you ship.
Same scanner library, applied to model output. Stop the assistant from quoting back the secret it found, or from emitting content that violates your policy.
Model routing
OpenAI-compatible in, any model out. Route by team, by route, by content classification, by cost ceiling, or by quality. Fail over between providers without rewriting clients.
Observability
Each request lands in the immutable audit log and the ClickHouse request log within milliseconds. Stream a copy to your SIEM, your data lake, or your existing observability stack — same structured payload, no proprietary format.
Integrate
Point your existing OpenAI / Anthropic / Bedrock SDK at the gateway. The wire shape is identical to the upstream provider — every existing client (curl, Python SDK, LangChain, OpenAI Node, Vercel AI SDK) keeps working.
Personal access token from the portal, OIDC client credentials for service principals. Scope to a team, a set of models, a budget. One click to revoke.
Start with a default — observe, scan, log, no enforcement. Promote rules to flag, then to block with blast-radius preview before you ship.
Every prompt, every completion, every scanner verdict, every cost line. Filter by principal, model, team, route, time, verdict — all in ClickHouse, all in the portal.
Specification
Next step
A 45-minute working session. Your provider, your network, your model. Leave with a working gateway, a PAT in CI, and a real policy.