Skip to content

AI Apps

Orchestrator

CRUD, lifecycle load/unload, pool tiers, and domain validation.

Intended audience: Stakeholders, Business analysts, Solution architects, Developers, Testers

Learning outcomes by role

Stakeholders

  • Explain AI Apps as billable or capacity-consuming units per org.

Business analysts

  • Define CRUD stories for create, update, delete, and tier selection.

Solution architects

  • Connect instance records to pool tiers and external orchestration backends.

Developers

  • Use orchestrator APIs and validation rules from cadence.api.orchestrator.

Testers

  • Cover domain validation errors, quota limits, and concurrent updates.

An AI App (historically called an orchestrator instance in the API) is one configured AI runtime for an org. It declares the framework (langgraph or openai_agents), the mode (for example supervisor, coordinator, handoff, or grounded — see Orchestration backends), which plugins are active, and how eagerly it should stay resident in memory (pool tier). The record lives in PostgreSQL; the running runtime lives in OrchestratorPool. HTTP routes are under /api/orgs/{org_id}/orchestrators.

Creating an instance writes a configuration record but does not start a runtime. The pool only holds the instance in memory after an explicit load call, which publishes an async event that a worker processes. Once loaded, the instance handles chat requests; when unloaded it returns 503 to callers.

SettingsService.validate_orchestrator_config runs before any record is written. It checks that the requested framework and mode are a valid combination, that the LLM configuration resolves correctly for this org, and that all referenced plugins exist in the org’s catalog. A failed validation returns 422 before anything is persisted.

  1. CreatePOST /api/orgs/{org_id}/orchestrators with roles_allowed(ORG_ORCHESTRATORS_WRITE). org_context enforces membership. SettingsService.validate_orchestrator_config checks framework, mode, config, and plugin references.
  2. PersistOrchestratorService and its repositories write the instance record; publish_after optionally emits a creation event to the message bus.
  3. Load — Call POST /api/orgs/{org_id}/orchestrators/{instance_id}/load. The handler validates org ownership and publishes a load event. The worker calls OrchestratorPool.create_instance and the instance becomes available for chat. The HTTP response returns 202 Accepted before the worker finishes — see Hot-reload and AI App pool.
  4. Chat — Clients call POST /api/chat/completion with X-ORG-ID and X-INSTANCE-ID. See Chat and engine.

Routes use roles_allowed with these permission constants:

OperationPermission
Read instance list and detailscadence:org:orchestrators:read
Create, update, deletecadence:org:orchestrators:write
Load, unload, reloadcadence:org:orchestrators:lifecycle

During application startup, load_hot_tier_instances runs after plugin catalog sync. It loads all active AI Apps whose tier is hot so they are ready before the first chat request arrives. Instances on demand tier are not preloaded — they load into the demand pool on first use (or via an explicit load call).

  • Pool capacity — Large numbers of hot instances increase memory and startup time. Use demand tier for low-traffic AI Apps and promote to hot before expected load spikes.
  • Async load202 Accepted means the event was published, not that the instance is ready. Poll pool stats or wait before sending chat requests.
  • RabbitMQ optional — If the message bus is unavailable, event_publisher may be None. Load and unload handlers tolerate this and write the record without broadcasting.