Durable execution (Temporal-style)

Wrap side-effectful steps so crashes, deploys, and retries replay from a step log instead of starting over.

When an agent crashes halfway through a 10-step workflow, the user doesn't want you to start over — they want you to resume. Durable execution gives you that with two primitives: a StepLogStore (persistence) and a runner.step(id, fn) wrapper (short-circuits to the recorded result if the id already exists in the log).

Install

Ships with @agentskit/runtime.

Wrap side effects in steps

import {
  createDurableRunner,
  createFileStepLog,
} from '@agentskit/runtime'

const store = await createFileStepLog('./runs/user-42.jsonl')
const runner = createDurableRunner({
  store,
  runId: 'user-42-onboard',
  maxAttempts: 3,
  retryDelayMs: 500,
})

await runner.step('create-account', async () => createAccount({ email }))
await runner.step('send-welcome', async () => sendEmail({ to: email, template: 'welcome' }))
await runner.step('charge-trial', async () => stripe.subscriptions.create({ ... }))

Re-run the same code (same runId) after a crash — completed steps short-circuit to their recorded values, only the remaining ones execute.

Stores

createInMemoryStepLog() — tests, single-process demos.
createFileStepLog(path) — JSONL on disk, append-only, survives restarts.
Bring your own — anything implementing { append, get, list, clear? } works (Redis, Postgres, S3, etc).

Step contract

A step is idempotent from the log's perspective: the fn does the side effect, the recorded result captures everything downstream steps need. Don't rely on global state outside the result.

const { userId } = await runner.step('create-account', async () => ({
  userId: await createAccount(email),
}))

// Downstream steps use `userId` — NOT global `req.user.id`, which
// might not exist on a resumed run.
await runner.step('send-welcome', async () => sendEmail({ userId }))

Retries

maxAttempts: total attempts per step (default 1 — fail-fast).
retryDelayMs: fixed backoff between attempts (default 0).
A step that fails all attempts is recorded with status: 'failure'; replaying the same stepId re-throws without running again (so you can diagnose without re-executing expensive failing work).

Call runner.reset() to wipe the log for a fresh retry.

Observability

createDurableRunner({
  store,
  runId,
  onEvent: e => logger.debug('durable', e),
})

Events: step:replay, step:start, step:success, step:failure.