AI Apps

Orchestrator

CRUD, lifecycle load/unload, pool tiers, and domain validation.

Intended audience: Stakeholders, Business analysts, Solution architects, Developers, Testers

Learning outcomes by role

Stakeholders

Explain AI Apps as billable or capacity-consuming units per org.

Business analysts

Define CRUD stories for create, update, delete, and tier selection.

Solution architects

Connect instance records to pool tiers and external orchestration backends.

Developers

Use orchestrator APIs and validation rules from cadence.api.orchestrator.

Testers

Cover domain validation errors, quota limits, and concurrent updates.

An AI App (historically called an orchestrator instance in the API) is one configured AI runtime for an org. It declares the framework (langgraph or openai_agents), the mode (for example supervisor, coordinator, handoff, or grounded — see Orchestration backends), which plugins are active, and how eagerly it should stay resident in memory (pool tier). The record lives in PostgreSQL; the running runtime lives in OrchestratorPool. HTTP routes are under /api/orgs/{org_id}/orchestrators.

How instances are created and used

Creating an instance writes a configuration record but does not start a runtime. The pool only holds the instance in memory after an explicit load call, which publishes an async event that a worker processes. Once loaded, the instance handles chat requests; when unloaded it returns 503 to callers.

SettingsService.validate_orchestrator_config runs before any record is written. It checks that the requested framework and mode are a valid combination, that the LLM configuration resolves correctly for this org, and that all referenced plugins exist in the org’s catalog. A failed validation returns 422 before anything is persisted.

Typical flow

Create — POST /api/orgs/{org_id}/orchestrators with roles_allowed(ORG_ORCHESTRATORS_WRITE). org_context enforces membership. SettingsService.validate_orchestrator_config checks framework, mode, config, and plugin references.
Persist — OrchestratorService and its repositories write the instance record; publish_after optionally emits a creation event to the message bus.
Load — Call POST /api/orgs/{org_id}/orchestrators/{instance_id}/load. The handler validates org ownership and publishes a load event. The worker calls OrchestratorPool.create_instance and the instance becomes available for chat. The HTTP response returns 202 Accepted before the worker finishes — see Hot-reload and AI App pool.
Chat — Clients call POST /api/chat/completion with X-ORG-ID and X-INSTANCE-ID. See Chat and engine.

Permissions

Routes use roles_allowed with these permission constants:

Operation	Permission
Read instance list and details	`cadence:org:orchestrators:read`
Create, update, delete	`cadence:org:orchestrators:write`
Load, unload, reload	`cadence:org:orchestrators:lifecycle`

Startup preloading

During application startup, load_hot_tier_instances runs after plugin catalog sync. It loads all active AI Apps whose tier is hot so they are ready before the first chat request arrives. Instances on demand tier are not preloaded — they load into the demand pool on first use (or via an explicit load call).

Limitations

Pool capacity — Large numbers of hot instances increase memory and startup time. Use demand tier for low-traffic AI Apps and promote to hot before expected load spikes.
Async load — 202 Accepted means the event was published, not that the instance is ready. Poll pool stats or wait before sending chat requests.
RabbitMQ optional — If the message bus is unavailable, event_publisher may be None. Load and unload handlers tolerate this and write the record without broadcasting.

Next steps

Chat and engine Completion routes and required headers.

Hot-reload and pool Tiers, load/unload, and 202 async semantics.

Orchestration modes Supervisor vs grounded and mode_config fields.