Skip to content

Implement transactional outbox pattern for Postgres+NATS dual-write safety #114

@haasonsaas

Description

@haasonsaas

Context

Services that publish to both Postgres and NATS (objectives, audit, pipeline, approvals) have no transactional outbox. The current pattern in natsbus publishes directly after DB write — a crash between the two leaves state inconsistent:

tx.Commit()       // succeeds
nats.Publish(...)  // crashes here → event lost, DB state diverged from event stream

This is a fundamental data consistency problem for any event-driven architecture.

Requirements

  • Add outbox.Publisher to service-runtime/natsbus that writes events to a Postgres outbox table within the same transaction as the domain write
  • Add background relay goroutine that polls the outbox table and publishes to NATS
  • Mark outbox rows as published after successful NATS ack
  • Add configurable polling interval and batch size
  • Add metrics: outbox depth, relay latency, publish failures
  • Provide WithOutbox(tx) option on the existing natsbus.Publish interface so services can opt in incrementally
  • Clean up published outbox rows after configurable retention (e.g., 24h)

Affected services

  • objectives — publishes state transition events
  • audit — publishes approval decision events
  • pipeline — publishes deal change events
  • approvals — publishes approval lifecycle events
  • notifications — publishes delivery events

Why this matters

Without this, any service restart or NATS blip silently drops events. For audit and compliance use cases, this is a data integrity violation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions