Why · 01

The two failure modes
every CFO and CISO sees first.

Leaked API keys and runaway spend are the most common, most embarrassing, and most preventable incidents in enterprise LLM adoption. They share a single root cause — raw vendor credentials are loose in the developer environment — and a single fix.

Indicative patterns from discovery assessments

Two recurring patterns.

Key custody

~12 – 40

Distinct OpenAI / Anthropic / vendor keys often discovered per organisation in first-pass assessments, across CI configs, developer machines, and forgotten Lambda environment variables. Ownership is frequently unclear for 40-60% of them.

Spend attribution

~60-80%

Estimated share of LLM spend finance teams initially can’t attribute beyond “Engineering / OpenAI” on the vendor invoice. Per-team, per-product, and per-agent breakdowns are often reconstructed manually, weeks late.

Ranges are directional field estimates from recent discovery work and should be treated as planning inputs, not guarantees.

How keys leak

Five places we find raw vendor keys, every time.

CWE-798

Hard-coded in source

The fastest way to ship a PoC. The slowest to remove from git history. We routinely find OPENAI_API_KEY = "sk-prod-…" in config.py, in feature branches that were merged a year ago.

ControlPre-commit secrets scanner + a gateway that issues short-lived PATs instead of long-lived sk- keys.
CWE-532

Logged in plaintext

Curl invocations, Datadog spans, debug print statements. The key shows up in your log retention forever. Anyone with log access becomes a credential holder.

ControlGateway never sees raw keys. Outbound traffic uses a server-side credential vault. Logs see PATs — rotatable, scoped, revocable.
CWE-540

Pasted into a wiki / Slack / ticket

An on-call hands the key to a colleague to unblock them. The Slack thread is searchable for 90 days. The Confluence page is searchable forever.

ControlPer-developer PATs from your IdP. No reason to ever share one. Every PAT is per-person and rotates on leaver process.
OWASP A02

Embedded in a frontend

Marketing site or browser extension calling OpenAI directly. View Source. The key is now public. Vendor revokes within hours. Damage already done.

ControlBrowser clients call your gateway with a session token. The gateway holds the vendor credential. Always.
Supply-chain

Inherited from a SaaS

You bought a tool that “just connects to OpenAI”. Their key is your key. Their breach becomes yours. You have no way to rotate.

ControlInsist on bring-your-own-key with rotation. Or front the SaaS with the gateway and route to vendors yourself.
Insider

Leaver still has the key

HR off-boarded the engineer. SCM revoked their commit access. Nobody rotated the OpenAI key they had memorised. They can still spend your money for months.

ControlIdentity-rooted PATs — revoking the IdP user revokes the gateway access. No standing vendor credential to forget.

How spend escapes

A vendor invoice is the wrong place to learn what happened.

Without a gateway in front, the only spend signal is the monthly invoice. By the time it arrives, the damage is done and the cause is gone. Three patterns behind most avoidable bill spikes we audit:

Wrong-tier defaults

An engineer copy-pasted gpt-4o from a tutorial. The chatbot answers “reset my password” with $0.05 of premium tokens. 1,000 users a day × 30 days = $1,500 burned on the smallest possible question.

Fix  Per-product policy pins the default model. Escalate to GPT-4o only when the response judge says quality < 0.7.

No cache, no compression

The same support FAQ, asked 4,200 times this week. Every call hits the vendor at full price. With aggressive caching and prompt-template normalisation, it’s common for 20-40% of calls to return in < 5 ms at near-zero marginal cost.

Fix  Semantic + exact cache, configurable TTL per route, signed result for replay.

Runaway loops

An agent in a retry loop, an MCP tool that returns 0 rows, a chain that fans out unbounded. No cap, no budget, no early stop. The bill triples in an afternoon.

Fix  Per-agent monthly cap with auto-pause. Per-call iteration ceiling. Spend forecast alerts at 80%.

The fix

One drop-in.
Two failure modes solved.

Both problems collapse to the same control: don’t let raw vendor credentials live in your developer environment. Put a gateway between the application and the vendor. The gateway holds the credential, issues identity-rooted PATs to apps, and observes every call on the way out.

  • Code change: replace the OpenAI base URL. ~30 minutes per service.
  • Vendor credentials live in the gateway’s vault. Apps never see them.
  • PATs are issued per developer, per agent, per service — rotatable, revocable.
  • Every call observable: model, region, tokens, cost, principal, agent, latency, verdict.
  • Spend caps, model pins, redaction policies enforced inline.

Honest objections

What a CTO usually pushes back with.

“A gateway is a single point of failure.”

The gateway is stateless and runs in your VPC. Scale horizontally, drain gracefully, fail open or closed per route. We publish HA and chaos-test runbooks. p99 added latency < 25 ms.

“We’ll lose features the vendor adds tomorrow.”

The gateway is wire-compatible and forwards unknown fields. New OpenAI features ship to your apps the day they ship at the vendor. We don’t parse the request body unless a scanner asks for it.

“Caching breaks correctness.”

Cache TTL and key are configurable per route, off by default for completion endpoints. Most savings come from idempotent routes (embeddings, classifications) where cache is correctness-preserving.

“We already log spend in our cloud bill.”

Vendor invoice gives you four lines and a month’s lag. The gateway gives you per-request, per-team, per-agent attribution — in a dashboard, in a SIEM, in a Slack alert.

Find every key.
Cap every agent.

A two-week pilot: scan the environment for vendor keys, point one app at the gateway, mint per-developer PATs, watch the request log fill up. Walk away with the inventory and a path to retire the long-lived keys.