Relay
Task scheduler and workflow engine built to manage complex, interdependent work across teams and systems. Handles reliability, observability, and human oversight.
Why We Built It
Project delivery involves hundreds of tasks: code reviews, deployments, data migrations, incident response. Standard schedulers like cron are fragile—they don't handle failures, they can't be rolled back, and when something breaks at 2am, you're flying blind.
We needed a system that:
- Retries failed tasks automatically with exponential backoff
- Lets us inspect, debug, and manually trigger tasks from a dashboard
- Stores complete history so we can audit what happened
- Integrates with our monitoring so failures surface before they hurt users
- Makes it safe to deploy new workflows without shutting down running ones
How It Works
Relay runs as a distributed service. Tasks are defined as code and stored in a database. A scheduler polls for due tasks and dispatches them to workers. Workers execute, report results, and the system decides whether to retry or move on.
Core Features
Automatic Retries
Failed tasks retry with exponential backoff (1s, 2s, 4s, 8s...) up to a limit. Idempotent tasks can be retried forever; critical ones fail after N attempts.
Complete Observability
Every task execution is logged: what ran, when, what the input was, what the output was. Dashboard lets you search, filter, and drill into any execution.
Manual Controls
Need to rerun a task that failed? Trigger it from the UI. Need to cancel a long-running workflow? Click a button. No SSH required.
Dead Letter Queues
Tasks that fail repeatedly go into a holding area. Ops can investigate, fix the underlying issue, and retry when ready.
How It Improves Delivery
With Relay, we went from worrying about whether background jobs would complete to confidently deploying complex multi-step workflows. Recovery from failures is now automatic.
Need reliable background job handling? Let's talk about your infrastructure →