Durable execution (Temporal-style)
Wrap side-effectful steps so crashes, deploys, and retries replay from a step log instead of starting over.
When an agent crashes halfway through a 10-step workflow, the user
doesn't want you to start over — they want you to resume. Durable
execution gives you that with two primitives: a StepLogStore
(persistence) and a runner.step(id, fn) wrapper (short-circuits to
the recorded result if the id already exists in the log).
Install
Ships with @agentskit/runtime.
Wrap side effects in steps
import {
createDurableRunner,
createFileStepLog,
} from '@agentskit/runtime'
const store = await createFileStepLog('./runs/user-42.jsonl')
const runner = createDurableRunner({
store,
runId: 'user-42-onboard',
maxAttempts: 3,
retryDelayMs: 500,
})
await runner.step('create-account', async () => createAccount({ email }))
await runner.step('send-welcome', async () => sendEmail({ to: email, template: 'welcome' }))
await runner.step('charge-trial', async () => stripe.subscriptions.create({ ... }))Re-run the same code (same runId) after a crash — completed steps
short-circuit to their recorded values, only the remaining ones
execute.
Stores
createInMemoryStepLog()— tests, single-process demos.createFileStepLog(path)— JSONL on disk, append-only, survives restarts.- Bring your own — anything implementing
{ append, get, list, clear? }works (Redis, Postgres, S3, etc).
Step contract
A step is idempotent from the log's perspective: the fn does the
side effect, the recorded result captures everything downstream
steps need. Don't rely on global state outside the result.
const { userId } = await runner.step('create-account', async () => ({
userId: await createAccount(email),
}))
// Downstream steps use `userId` — NOT global `req.user.id`, which
// might not exist on a resumed run.
await runner.step('send-welcome', async () => sendEmail({ userId }))Retries
maxAttempts: total attempts per step (default 1 — fail-fast).retryDelayMs: fixed backoff between attempts (default 0).- A step that fails all attempts is recorded with
status: 'failure'; replaying the samestepIdre-throws without running again (so you can diagnose without re-executing expensive failing work).
Call runner.reset() to wipe the log for a fresh retry.
Observability
createDurableRunner({
store,
runId,
onEvent: e => logger.debug('durable', e),
})Events: step:replay, step:start, step:success, step:failure.
See also
- HITL approvals — pause a step until a human approves.
- Background agents — run durable flows on a cron or webhook.