Operations guide

Name: go-oidc-provider
Author: libraz

The library deliberately stays out of HTTP-lifecycle and infra concerns: there is no /metrics route mounted, no opinion about how keys live in your cluster, no built-in tracer. This section documents the seams the OP exposes and how production embedders typically wire them.

What lives here

Page	When you need it
Key rotation	rotating signing keys (`Keyset`) and cookie keys (`WithCookieKey`) without dropping live sessions
JWKS endpoint	what `/jwks` advertises, cache headers, ETag, RP cache behaviour
Multi-instance deployment	running more than one OP replica — DPoP nonce sharing, session placement, sticky vs round-robin
Observability	logging (`*slog.Logger`), tracing (`otelhttp` middleware), Prometheus, request-IDs
Backup & disaster recovery	what to back up, what to skip, RPO targets per substore

Out of scope (and why)

Concern	Why the library doesn't ship it
`/metrics` HTTP route	your router's job; the library exposes counters via `WithPrometheus(reg)` and lets you mount `/metrics` where it fits your auth boundary
HTTP request-duration histograms	belongs in HTTP middleware (`otelhttp`, `prometheus/promhttp`); the OP only emits OIDC-business counters
Tracing instrumentation inside endpoint handlers	the public surface is an `http.Handler`; wrap with `otelhttp.NewMiddleware` once at the seam
Rate limiting	upstream — Cloudflare, Envoy, or a Go middleware of your choice. The OP emits `rate_limit.exceeded` audit events when something in the chain rejects a request, but does not implement the limiter itself
Health checks	embedder territory; the OP has no opinion about liveness vs readiness for your stack
Background workers	the OP is stateless across requests; no internal goroutines for cleanup. TTL-based eviction is the store's responsibility

Operational philosophy

A handful of decisions cascade through every page in this section:

One backend per transactional cluster. The composite store refuses to split clients / codes / refresh tokens / access tokens / IATs across two backends — see Hot/cold split.
Volatile substores are best-effort. Sessions, DPoP nonces, JAR jti registry can live on Redis without persistence. Eviction is a normal operating mode; the OP audits the gap rather than failing open. See Multi-instance deployment.
No background goroutines. TTL cleanup is the store's job. SQL adapters do periodic prune via your DB scheduler; Redis adapters use native TTL. There is no in-process janitor that could leak on hot-reload.
Configuration changes require a new Provider. Rotating keys, adding clients, changing scopes — all of these construct a new *Provider and replace the handler. See Key rotation for the supervisor-side pattern.

Where to start

If you've never run the OP in production, read these in order:

Key rotation — the most common ops cadence (monthly, quarterly).
Multi-instance deployment — required reading before scaling past one replica.
Observability — wire logging / metrics / tracing on day one rather than after an incident.
Backup & DR — at least set the RPO targets before you go live.

Operations guide ​

What lives here ​

Out of scope (and why) ​

Operational philosophy ​

Where to start ​

Operations guide

What lives here

Out of scope (and why)

Operational philosophy

Where to start