Quickstart
from zero to a running engine + dashboard in one command
npx @dicabrio/durable
# → dashboard + API on http://localhost:3030This starts an embedded Postgres (first run downloads the binary; data persists in ~/.durable/pgdata), applies all migrations, and serves the dashboard and API on one port. No Docker.
Connect an app (TypeScript)
import { createFunction, serve, DurableClient } from "@durable/sdk";
const hello = createFunction({
id: "hello",
trigger: { event: "demo.hello" },
handler: async ({ event, step }) => {
return step.run("greet", () => `hi ${event.data.name}`);
},
});
// 1. create a workspace in the dashboard (or POST /api/apps) → id + signing key
const client = new DurableClient({
baseUrl: "http://localhost:3030",
appId: process.env.APP_ID, signingKey: process.env.APP_KEY,
appUrl: "http://localhost:4000/api/durable",
});
// 2. expose the callback + register your functions
app.post("/api/durable", serve([hello], { signingKey }));
await client.sync([hello]);
// 3. fire events
await client.send({ name: "demo.hello", data: { name: "world" } });npm run dev in the repo starts the full stack (Postgres in Docker, Adminer, service, a worker, and three demo apps) via process-compose.Concepts
the five nouns, and the replay model that makes them durable
| Term | Meaning |
|---|---|
| event | A named fact (user.created) with a JSON payload, sent into one app's workspace. |
| function | Your handler plus its trigger and options, registered via sync. |
| run | One execution of a function for one event. |
| step | A named unit inside a run (step.run("send-email", …)) that executes exactly once. |
| workspace | An app × environment pair — fully isolated data, functions and signing key. |
The replay model
The engine never runs your code. It POSTs the run's state — the triggering event plus all memoized step results — to your app. The SDK calls your handler from the top: completed steps return their stored results instantly (no side effects), and the first new step executes for real. Its result is persisted and the cycle repeats, one step per round-trip, until the function returns.
Because each invocation starts from the top, your handler must be deterministic between steps: put every side effect (DB write, API call, randomness, Date.now()) inside a step.run.
Run state, two layers
| Layer | Values | Question it answers |
|---|---|---|
| status | active · completed · failed · cancelled | Is the run finished? |
| activity | executing · queued · waiting · sleeping · scheduled | What is an active run doing right now? |
A run waiting seven days for an approval is active · waiting — alive, but consuming no compute and no worker slot.
Steps API
three primitives cover almost every workflow
step.run(id, fn)
Execute a side effect once; the result is memoized and replayed forever after. A throw becomes a retry (exponential backoff, then the run fails).
const invoice = await step.run("create-invoice", () => billing.create(order));step.sleep(id, duration)
Durable pause — "90s", "12h", "30d" or milliseconds. No process waits; a timer wakes the run. Survives restarts and deploys.
step.waitForEvent(id, { event, match?, timeout })
Park the run until a matching event arrives, or the timeout elapses. match is a subset check against the incoming event.data. Resolves with the event, or null on timeout — human-in-the-loop in four lines:
const approval = await step.waitForEvent("approve", {
event: "approval.received",
match: { orderId: event.data.orderId },
timeout: "7d",
});
if (!approval) return { rejected: "timeout" };Triggers & cron
event-driven or on a schedule
trigger: { event: "order.paid" } // runs per matching event
trigger: { cron: "0 3 * * *" } // daily at 03:00
trigger: { cron: "*/20 * * * * *" } // 6-field: every 20 secondsCron runs receive a synthetic $cron event. Schedules never double-fire (row locks) and never storm after downtime — the next occurrence is always computed strictly in the future.
Flow control
six per-function policies, all with an optional per-key scope
| Option | Effect on a burst | Use for |
|---|---|---|
| concurrency | max N executing at once; excess queues | protecting APIs & resources |
| priority | higher starts sooner under contention | VIP tenants, critical work |
| throttle | starts spread over time; nothing dropped | external rate limits |
| rateLimit | excess runs dropped | abuse, duplicate webhooks |
| debounce | burst collapses to one run with the last event, after quiet | rapid saves → one reindex |
| batch | events grouped; one run gets the whole list | bulk writes, metric ingestion |
createFunction({
id: "sync-crm",
trigger: { event: "contact.changed" },
concurrency: { limit: 2, key: "tenantId" }, // per tenant
priority: 10,
throttle: { limit: 1, period: "3s" },
rateLimit: { limit: 100, period: "1m" },
debounce: { period: "5s", key: "contactId" },
// batch: { maxSize: 25, timeout: "10s" } → event.data = the list
handler: async ({ event, step }) => { /* … */ },
});Apps & environments
isolation is the default, environments are explicit
An app identity is (name, environment). Every combination is a fully isolated workspace: its own app_id, its own signing key, its own events, functions and runs. An event fired into billing · acc can never trigger billing · prod.
The environment defaults to dev — you never set it locally. For acceptance and production you opt in explicitly (DURABLE_ENV=acc|prod, or pick it when creating the workspace). The dashboard shows loud color-coded badges: dev grey, acc amber, prod red.
# idempotent per (name, environment) — returns the same app + key every time
curl -X POST :3030/api/apps -d '{"name":"billing","environment":"prod"}'Auth: every app→service call carries x-durable-app plus an HMAC-SHA256 signature over the raw body; service→app callbacks are signed with the same per-workspace key.
Dashboard
realtime, app-centric, safe in production
- Realtime everywhere — Postgres NOTIFY → SSE push; every screen updates the moment data changes.
- Trace drawer — click a run: a waterfall per step, each bar split grey (durable queue/sleep time) vs green (your server's execution time); click a step for its input/output; expand for the exact split.
- Run actions — Rerun (from scratch), Rerun from step (steps before it are reused, the chosen step re-executes), Cancel.
- Metrics — throughput, failure rate, durable-delay vs app-time, per-function p95, and backlog depth over time (1h / 24h / 7d).
- Prod guards — in a prod workspace, Fire/Run/Cancel/Rerun arm on first click and execute only on a confirming second click.
- Production auth — set
DURABLE_ADMIN_TOKENand the dashboard, tRPC and SSE surface require sign-in (httpOnly session cookie or a Bearer token). Unset = open, for local dev.
PHP SDK
same replay model, dependency-free, PHP ≥ 8.1
use Durable\{Client, DurableFunction, Serve, Step};
$fn = new DurableFunction(
id: 'onboarding',
trigger: ['event' => 'user.created'],
handler: function (array $event, Step $step) {
$user = $step->run('load-user', fn () => loadUser($event['data']['id']));
$step->sleep('cooldown', '3s');
return $step->run('send-email', fn () => sendMail($user));
},
);
// callback endpoint (vanilla PHP, Laravel, Symfony — anything):
Serve::handle([$fn], $signingKey);
// register + fire:
$client = new Client($baseUrl, $appId, $key, $appUrl);
$client->sync([$fn]);
$client->send('user.created', ['id' => 'u1']);All function options (concurrency, priority, throttle, rateLimit, debounce, batch) are supported with human-readable periods ('3s', '7d'). See sdk-php/example/ for a runnable app on PHP's built-in server.
Operations
scaling, shutdown, configuration
Scaling workers
Queue capacity is a process count. The SKIP LOCKED queue (with an exact, advisory-locked concurrency gate) makes concurrent instances safe — run as many worker-only processes as you need:
npm run worker # pure capacity: no HTTP, no schedulerGraceful drain
On SIGINT/SIGTERM the service stops pulling new jobs, finishes what's in flight (bounded by DURABLE_DRAIN_TIMEOUT_MS, default 15s), then exits. On timeout, abandoned jobs recover via lease expiry — nothing is lost either way. A second signal forces exit. App callbacks are bounded to the job lease, so a hung app can't wedge a drain.
Environment variables
| Variable | Default | Purpose |
|---|---|---|
| PORT | 3030 | service + dashboard port |
| DATABASE_URL | — | Postgres connection (unused with embedded PG) |
| DURABLE_PG_PORT / DURABLE_PG_DIR | 5434 / ~/.durable/pgdata | embedded Postgres |
| DURABLE_WORKERS | 2 | worker loops per process |
| DURABLE_LEASE_MS | 30000 | job lease + app-call timeout |
| DURABLE_MAX_ATTEMPTS | 3 | step retries before a run fails |
| DURABLE_DRAIN_TIMEOUT_MS | 15000 | graceful-drain bound |
| DURABLE_ADMIN_TOKEN | unset | set → dashboard/tRPC/SSE require sign-in |
| DURABLE_ENV | dev | app environment on self-provisioning |
Wire protocol
small enough to port an SDK in an afternoon
One HTTP round-trip advances a run by at most one new step. All bodies are JSON; every request and response is signed: x-durable-signature: hex(hmac_sha256(rawBody, key)), app→service calls also send x-durable-app: <appId>.
Service → app (invoke)
POST {appUrl}
{ "runId": "…", "functionId": "onboarding",
"event": { "id": "…", "name": "user.created", "data": { … } },
"steps": { "load-user": { "type": "run", "data": { … } } } }App → service (the next operation reached)
{ "op": "step", "id": "send-email", "data": … }
{ "op": "sleep", "id": "cooldown", "until": "2026-07-05T09:00:00.000Z" }
{ "op": "wait", "id": "approve", "event": "approval.received",
"match": { "orderId": "o1" } | null, "until": "…" }
{ "op": "done", "data": … }
{ "op": "error", "id": "send-email" | null, "message": "…", "retryable": true }App → service (management)
POST /e { "name": "user.created", "data": { … } }
POST /fn/sync { "url": "https://app/api/durable", "functions": [ { "id", "trigger", …options } ] }
POST /api/apps { "name": "billing", "environment": "dev" } # provision (admin)That's the whole surface an SDK needs: sign, sync, send, and answer invokes with one of five ops. The TypeScript and PHP implementations are both under 300 lines.