Why · 01
The two failure modes
every CFO and CISO sees first.
Leaked API keys and runaway spend are the most common, most embarrassing, and most preventable incidents in enterprise LLM adoption. They share a single root cause — raw vendor credentials are loose in the developer environment — and a single fix.
Indicative patterns from discovery assessments
Two recurring patterns.
Key custody
~12 – 40
Distinct OpenAI / Anthropic / vendor keys often discovered per organisation in first-pass assessments, across CI configs, developer machines, and forgotten Lambda environment variables. Ownership is frequently unclear for 40-60% of them.
Spend attribution
~60-80%
Estimated share of LLM spend finance teams initially can’t attribute beyond “Engineering / OpenAI” on the vendor invoice. Per-team, per-product, and per-agent breakdowns are often reconstructed manually, weeks late.
Ranges are directional field estimates from recent discovery work and should be treated as planning inputs, not guarantees.
How keys leak
Five places we find raw vendor keys, every time.
Hard-coded in source
The fastest way to ship a PoC. The slowest to remove from git history. We routinely find OPENAI_API_KEY = "sk-prod-…" in config.py, in feature branches that were merged a year ago.
Logged in plaintext
Curl invocations, Datadog spans, debug print statements. The key shows up in your log retention forever. Anyone with log access becomes a credential holder.
Pasted into a wiki / Slack / ticket
An on-call hands the key to a colleague to unblock them. The Slack thread is searchable for 90 days. The Confluence page is searchable forever.
Embedded in a frontend
Marketing site or browser extension calling OpenAI directly. View Source. The key is now public. Vendor revokes within hours. Damage already done.
Inherited from a SaaS
You bought a tool that “just connects to OpenAI”. Their key is your key. Their breach becomes yours. You have no way to rotate.
Leaver still has the key
HR off-boarded the engineer. SCM revoked their commit access. Nobody rotated the OpenAI key they had memorised. They can still spend your money for months.
How spend escapes
A vendor invoice is the wrong place to learn what happened.
Without a gateway in front, the only spend signal is the monthly invoice. By the time it arrives, the damage is done and the cause is gone. Three patterns behind most avoidable bill spikes we audit:
Wrong-tier defaults
An engineer copy-pasted gpt-4o from a tutorial. The chatbot answers “reset my password” with $0.05 of premium tokens. 1,000 users a day × 30 days = $1,500 burned on the smallest possible question.
Fix Per-product policy pins the default model. Escalate to GPT-4o only when the response judge says quality < 0.7.
No cache, no compression
The same support FAQ, asked 4,200 times this week. Every call hits the vendor at full price. With aggressive caching and prompt-template normalisation, it’s common for 20-40% of calls to return in < 5 ms at near-zero marginal cost.
Fix Semantic + exact cache, configurable TTL per route, signed result for replay.
Runaway loops
An agent in a retry loop, an MCP tool that returns 0 rows, a chain that fans out unbounded. No cap, no budget, no early stop. The bill triples in an afternoon.
Fix Per-agent monthly cap with auto-pause. Per-call iteration ceiling. Spend forecast alerts at 80%.
The fix
One drop-in.
Two failure modes solved.
Both problems collapse to the same control: don’t let raw vendor credentials live in your developer environment. Put a gateway between the application and the vendor. The gateway holds the credential, issues identity-rooted PATs to apps, and observes every call on the way out.
- Code change: replace the OpenAI base URL. ~30 minutes per service.
- Vendor credentials live in the gateway’s vault. Apps never see them.
- PATs are issued per developer, per agent, per service — rotatable, revocable.
- Every call observable: model, region, tokens, cost, principal, agent, latency, verdict.
- Spend caps, model pins, redaction policies enforced inline.
Honest objections
What a CTO usually pushes back with.
“A gateway is a single point of failure.”
The gateway is stateless and runs in your VPC. Scale horizontally, drain gracefully, fail open or closed per route. We publish HA and chaos-test runbooks. p99 added latency < 25 ms.
“We’ll lose features the vendor adds tomorrow.”
The gateway is wire-compatible and forwards unknown fields. New OpenAI features ship to your apps the day they ship at the vendor. We don’t parse the request body unless a scanner asks for it.
“Caching breaks correctness.”
Cache TTL and key are configurable per route, off by default for completion endpoints. Most savings come from idempotent routes (embeddings, classifications) where cache is correctness-preserving.
“We already log spend in our cloud bill.”
Vendor invoice gives you four lines and a month’s lag. The gateway gives you per-request, per-team, per-agent attribution — in a dashboard, in a SIEM, in a Slack alert.
Find every key.
Cap every agent.
A two-week pilot: scan the environment for vendor keys, point one app at the gateway, mint per-developer PATs, watch the request log fill up. Walk away with the inventory and a path to retire the long-lived keys.