Why every platform team is rebuilding around LLMs
RAG pipelines, agent runtimes, and eval harnesses are moving from side projects to core infrastructure. Here's what the shift looks like in production.

For most of the last decade, "platform engineering" meant Kubernetes, CI/CD, and a paved road for shipping services. In 2026, a second paved road is being laid down next to it — one for LLM-powered features — and the teams who own it look a lot like the teams who owned containers five years ago.
The new primitives
A production LLM stack has settled into a recognizable shape:
- Gateways that handle routing, retries, rate limits, and cost attribution across providers.
- Retrieval pipelines (RAG) that keep a vector index fresh and observable.
- Eval harnesses that score outputs continuously, not just at release.
- Agent runtimes with tool sandboxes, timeouts, and audit logs.
None of these are research problems anymore. They're infrastructure, and infrastructure is what platform teams do.
Treat prompts like config
The biggest cultural shift is treating prompts and model choices as versioned configuration, not code buried in a service:
from observe import gateway
resp = gateway.chat(
model="claude-opus-4-8",
prompt_id="support/triage@v7", # versioned, not inline
inputs={"ticket": ticket_text},
budget_usd=0.02, # hard ceiling, attributed to the team
)
Once prompts are versioned artifacts, you can roll them back, A/B them, and diff them — the same things you already do for everything else in production.
What to build first
If you're standing this up, start with the gateway and cost attribution. You can't manage what you can't see, and the first painful surprise is always the bill. Retrieval and evals come next, once you have traffic flowing through one observable choke point.
The teams getting this right aren't the ones with the fanciest models. They're the ones who made LLM calls boring, observable, and cheap to change.