Skip to content

Observability

Process logs, health endpoints, and OpenTelemetry configuration.

Intended audience: Stakeholders, Business analysts, Solution architects, Developers, Testers

Learning outcomes by role

Stakeholders

  • Explain stdout logs, health endpoints, and tracing as operational visibility investments.

Business analysts

  • Tie observability signals to SLIs in runbooks and incident templates.

Solution architects

  • Design OTLP export, sampling, and collector placement for production.

Developers

  • Tune process logging in `cadence.main`, OTel keys in `global_settings`, and `/api/admin/telemetry` APIs.

Testers

  • Validate health checks, log fields, and trace spans in staging environments.

Cadence exposes three observability layers: structured JSON logs (stdout), HTTP health and pool endpoints for probes, and OpenTelemetry settings adjustable via admin APIs without restarting the process.

  • Operational spend — Logs and traces have marginal cost; aggressive tracing and high-cardinality LLM spans can raise collector and storage bills.
  • Progressive rollout — stdout logs and /health first; add OTLP with sampling before full production load.
  • SLIs — Pair log-based error rate with trace latency for chat paths when defining SLOs.
  • Runbooks — Reference otel.* keys in global_settings and log pipeline routing in incident checklists.
Observability layers Process logs on stdout, health endpoints for probes, optional OpenTelemetry export via OTLP. Process logs (stdout) cadence.main — logging.basicConfig Health & pool endpoints /health, admin pool stats (see Monitoring guide) OpenTelemetry (optional) otel.* via admin APIs → OTLP

Implementation references: cadence.domain.telemetry (otel.* keys), cadence.api.health, cadence.main logging setup.

LayerHow to enableWhat it provides
Process logslogging.basicConfig in cadence.main (text by default)Plain stdout logs; ship with your platform agent or sidecar
Health endpointsAlways availableLiveness (GET /health) and deeper admin health
Distributed tracesotel.enabled in global_settings (via /api/admin/telemetry or DB)Spans across internal steps; export via OTLP when configured

Start with stdout logs + /health in staging, then enable OTLP with a low sample rate before raising to production traffic. Jumping straight to always_on tracing on busy clusters can overwhelm your collector.

Default API process logging is configured in cadence.main (logging.basicConfig, text format). The otel.logs_enabled setting (stored with other otel.* keys) relates to OpenTelemetry log export, not a CADENCE_LOG_FORMAT environment variable.

Stream errors on SSE chat are logged and emitted on the wire from cadence/api/chat/router.py:

cadence/api/chat/router.py
except Exception as e:
logger.error("Stream error: %s", e, exc_info=True)
...
yield f"event: error\ndata: {json.dumps(err_data)}\n\n"

OTel settings are persisted as otel.* rows in global_settings. HTTP access: GET / PATCH /api/admin/telemetry (cadence/api/telemetry/router.py). Key metadata for validation lives in cadence/domain/telemetry/service.py:

cadence/domain/telemetry/service.py
_KEY_META: dict[str, tuple[str, str]] = {
"otel.enabled": ("boolean", "Enable OpenTelemetry instrumentation"),
"otel.service_name": ("string", "OTel service name reported to the collector"),
"otel.service_version": (
"string",
"OTel service version reported to the collector",
),
"otel.environment": ("string", "Deployment environment label"),
"otel.exporter": (
"string",
"Exporter backend: console | otlp_grpc | otlp_http | none",
),
# ... remaining otel.* keys — see source for the full map
}
  1. Note approximate time, org, orchestrator, and message id.
  2. Search JSON logs for errors or stream warnings around that window.
  3. If traces are enabled, open your collector and find the trace id — inspect model latency vs tool latency vs queue wait.
  4. Check pool stats (GET /api/admin/pool/stats) — long queue times often indicate instances were evicted from the demand pool or not loaded.
  5. If traces are missing entirely, verify otel.enabled and exporter settings via admin APIs.
SymptomCauseFix
No tracesotel.enabled is false, or exporter/network misconfiguredSet otel.enabled=true; check otel.endpoint and network egress
High cardinality costsSDK instrumentation flags enabled in productionDisable otel.instrument_langchain / otel.instrument_openai_agents
Plain text logs onlyDefault logging config in cadence.mainAdd a log shipper or enable OTel log export via otel.logs_enabled