Proprietary Tool Internal • Active

Relay

Task scheduler and workflow engine built to manage complex, interdependent work across teams and systems. Handles reliability, observability, and human oversight.

Why We Built It

Project delivery involves hundreds of tasks: code reviews, deployments, data migrations, incident response. Standard schedulers like cron are fragile—they don't handle failures, they can't be rolled back, and when something breaks at 2am, you're flying blind.

We needed a system that:

Retries failed tasks automatically with exponential backoff
Lets us inspect, debug, and manually trigger tasks from a dashboard
Stores complete history so we can audit what happened
Integrates with our monitoring so failures surface before they hurt users
Makes it safe to deploy new workflows without shutting down running ones

How It Works

Relay runs as a distributed service. Tasks are defined as code and stored in a database. A scheduler polls for due tasks and dispatches them to workers. Workers execute, report results, and the system decides whether to retry or move on.

Core Features

Automatic Retries

Failed tasks retry with exponential backoff (1s, 2s, 4s, 8s...) up to a limit. Idempotent tasks can be retried forever; critical ones fail after N attempts.

Complete Observability

Every task execution is logged: what ran, when, what the input was, what the output was. Dashboard lets you search, filter, and drill into any execution.

Manual Controls

Need to rerun a task that failed? Trigger it from the UI. Need to cancel a long-running workflow? Click a button. No SSH required.

Dead Letter Queues

Tasks that fail repeatedly go into a holding area. Ops can investigate, fix the underlying issue, and retry when ready.

How It Improves Delivery

90% Reduction in manual ops work

5min MTTR (mean time to recovery)

0 Tasks run twice by accident

100% Task history auditable

With Relay, we went from worrying about whether background jobs would complete to confidently deploying complex multi-step workflows. Recovery from failures is now automatic.

Need reliable background job handling? Let's talk about your infrastructure →