Skip to content

eusahn/relay

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Relay — webhook ingestion & delivery platform

A self-hosted mini Svix/Hookdeck: a reliability buffer that sits between webhook senders (Stripe, GitHub, …) and your application. It catches webhooks durably, verifies signatures, deduplicates retries, and fans them out to subscriber endpoints with exponential-backoff retries, dead-letter queues, and replay.

Built on AWS serverless (Lambda, API Gateway, DynamoDB, SNS→SQS) with Terraform. Runs for ~$0/month, is account-agnostic (clone and apply into any AWS account), and is designed to be terraform destroyed between sessions.

Architecture

sender ──→ API GW ──→ Lambda (api) ──→ DynamoDB
  (signed POST)        verify HMAC,        │
                       dedupe, store       └─→ SNS topic ──┬─→ SQS deliveries ─→ Lambda ─→ subscriber URLs
                                               (fanout)    │      └ DLQ            (POST per matching sub,
                                                           │                        signed; demo receiver:
                                                           │                        ?mode=ok|fail|flaky)
                                                           ├─→ SQS audit ─→ Lambda ─→ S3 (raw JSON archive)
                                                           │      └ DLQ
                                                           └─→ SQS metrics ─→ Lambda ─→ DynamoDB counters
                                                                  └ DLQ                 (atomic ADD)

One accepted event fans out to three independent consumers — each with its own queue, DLQ, retry policy, and IAM scope, all stamped from the same queue-consumer Terraform module. A delivery outage never blocks the audit trail or the metrics. DLQ depth alarms notify an SNS alerts topic (email), and parked messages are recoverable via the replay API.

Quickstart

make bootstrap   # one-time: create the S3 remote-state bucket (survives destroys)
make init        # connect the dev stack to the state bucket (name derived from your account)
make apply       # deploy (~60s)
make smoke       # 13-step end-to-end test
make destroy     # tear it all down (state bucket survives for next session)

To remove everything including the state bucket — e.g. walking away from the project — use make destroy-all (dev stack first, then bucket).

Optional: put alert_email = "you@example.com" in terraform/envs/dev/terraform.tfvars (gitignored) before applying to receive DLQ alarm emails — AWS sends a confirmation link on first apply.

Requires: Terraform ≥ 1.10, AWS credentials, curl, openssl, python3.

API

Route Purpose
POST /sources Register a sender. Returns id + signing secret (shown once).
GET /sources List sources (without secrets).
POST /in/{source_id} Ingest endpoint — paste into the sender's webhook config.
GET /sources/{source_id}/events Recent events for a source, newest first.
POST /sources/{source_id}/subscriptions Register a subscriber: {url, events} where events is a glob like payment.* (default *).
GET /sources/{source_id}/subscriptions List subscriptions (without secrets).
GET /sources/{source_id}/events/{event_id}/deliveries Per-subscription delivery status for an event.
GET /sources/{source_id}/metrics Counts of events received, per day per type.
POST /sources/{source_id}/events/{event_id}/replay Re-queue one event for delivery (already-delivered subscriptions are skipped).
POST /replay/dlq Bulk recovery: redrive everything parked in the deliveries DLQ.

Ingest requests must carry x-relay-signature: hex HMAC-SHA256 of the raw body using the source secret. Optional x-idempotency-key header controls deduplication (defaults to a hash of the body). Duplicates are acked with 200 {"duplicate": true} so sender retries never see errors.

Simulate a sender:

API=$(make url)
SRC=$(curl -s -X POST "$API/sources" -d '{"name":"stripe"}')
./scripts/send-event.sh "$API" <src_id> <secret> '{"type":"payment.succeeded","amount":4200}'

Demo: failure → retries → DLQ → alarm → replay

The deployed stack includes a demo receiver Lambda whose behavior switches on a query param, so every failure mode is reproducible:

API=$(make url)
RECEIVER=$(terraform -chdir=terraform/envs/dev output -raw receiver_url)

# Subscribe a *flaky* endpoint: fails the first delivery attempt, then recovers
curl -s -X POST "$API/sources/<src_id>/subscriptions" \
  -d "{\"url\": \"${RECEIVER}?mode=flaky\"}"

# Send an event, then watch the delivery go failed → delivered
./scripts/send-event.sh "$API" <src_id> <secret>
curl -s "$API/sources/<src_id>/events/<event_id>/deliveries"

# mode=fail exhausts all 3 attempts (with exponential backoff: 20s, 40s)
# and parks the message in the DLQ:
aws sqs get-queue-attributes \
  --queue-url "$(terraform -chdir=terraform/envs/dev output -raw deliveries_dlq_url)" \
  --attribute-names ApproximateNumberOfMessages

# ...which fires the DLQ CloudWatch alarm (email arrives if alert_email is set).
# Recover after fixing the endpoint:
curl -s -X POST "$API/replay/dlq"                                  # bulk redrive
curl -s -X POST "$API/sources/<src_id>/events/<event_id>/replay"   # or one event

Outbound deliveries are signed too (x-relay-signature with the subscription's secret) and carry x-relay-event-id / x-relay-attempt headers.

Design notes

  • Single-table DynamoDBSRC#<id> partition holds the source META, its events (EVT#<ts>#<id>, time-ordered), DEDUP#<key> idempotency markers (24h TTL), SUB#<id> subscriptions, DLV#<event_id>#<sub_id> delivery records, and MET#<day>#<type> counters.
  • Exactly-once ingest — a DynamoDB transaction writes the event and a conditional dedup marker atomically; a sender retry cancels the transaction (checked against CancellationReasons, not blindly) and returns the original event id.
  • At-least-once delivery, no cross-talk — a failed delivery makes the worker raise, so SQS redrives the whole message with exponential backoff; DLV records mark which subscriptions already succeeded, so retries skip them and one flaky endpoint never causes duplicate deliveries to healthy ones. After 3 attempts the message parks in the DLQ, the depth alarm fires, and the replay API recovers it.
  • Replay bypasses the topic — replays go straight onto the deliveries queue, not SNS, so re-delivering an event never re-archives it or double-counts metrics.
  • Secrets — source/subscription signing secrets are returned only at creation and never listed. Stored in DynamoDB for HMAC use; production would envelope-encrypt them with KMS (skipped here: customer-managed keys cost $1/mo each).
  • Cost discipline — everything is pay-per-request, Lambdas stay out of any VPC (no NAT Gateway), log retention is 7 days, state bucket is the only persistent resource.

Testing

  • make venv && make test — unit tests for the storage layer against an in-memory DynamoDB (moto); no AWS account needed, runs in ~1s.
  • make smoke — 13-step end-to-end test against a deployed stack, covering signed ingest, dedupe, fanout to all three consumers, filtering, and replay.

Roadmap

  1. Core: sources, signed ingest, idempotent event store
  2. Subscriptions + SNS→SQS fanout (deliveries/audit/metrics) + delivery worker + DLQs + demo receiver
  3. Exponential backoff, DLQ alarms → email, replay endpoints, CloudWatch dashboard, moto unit tests
  4. GitHub Actions CI/CD via OIDC (plan on PR, apply on merge, smoke test post-deploy)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors