Skip to content

Latest commit

 

History

History
312 lines (236 loc) · 12.2 KB

File metadata and controls

312 lines (236 loc) · 12.2 KB

Worker Architecture Overview

The Powernode Worker is a fully isolated Sidekiq 7.2 process that executes background jobs. It has zero direct database access — all data flows through HTTP API calls to the Rails backend.


Isolation Principle

┌─────────────────────────────────┐          ┌──────────────────────────────┐
│          WORKER                 │          │          SERVER              │
│                                 │          │                              │
│  Sidekiq 7.2 (221 jobs)         │   HTTP   │  Rails 8 API                 │
│  46 services                    │ ───────> │  398 controllers             │
│  Redis DB 1 (queues only)       │ <─────── │  603 services                │
│  NO database access             │   JSON   │  PostgreSQL + Redis DB 0     │
│  NO ActiveRecord models         │          │                              │
│  NO shared gems with server     │          │                              │
└─────────────────────────────────┘          └──────────────────────────────┘

Critical rules:

  • Job files belong in worker/app/jobs/NEVER server/app/jobs/
  • NEVER add Sidekiq gems to server/Gemfile
  • NEVER modify worker/ files when fixing server issues
  • Worker communicates exclusively via HTTP API — no ActiveRecord, no direct SQL

API-Only Communication

4 API Clients

Client File Purpose
BackendApiClient app/services/backend_api_client.rb Primary server communication — accounts, subscriptions, analytics, AI, DevOps, all CRUD operations
ApiClient app/services/api_client.rb Base HTTP client for analytics and reporting endpoints
WebAuthApiClient app/services/web_auth_api_client.rb Sidekiq Web UI authentication — separate circuit breaker so auth failures don't affect job processing
LlmProxyClient app/services/llm_proxy_client.rb AI model proxy — routes LLM calls through server's internal/ai/llm endpoints for tool-calling, structured output, and memory injection

Authentication

All API clients authenticate using JWT service tokens generated by WorkerJwt:

# worker/app/services/worker_jwt.rb
WorkerJwt.token  # Generates a JWT signed with WORKER_SERVICE_TOKEN

Two auth helpers:

  • PrimaryServiceAuth — Standard worker → server authentication
  • SystemWorkerAuth — System-level operations (elevated privileges)

Request headers:

Authorization: Bearer <worker-jwt-token>
Content-Type: application/json
User-Agent: PowernodeWorker/1.0

Circuit Breakers

All API clients use circuit breakers to prevent cascading failures:

Circuit Breaker Timeout Use Case
Backend API 120s Standard server communication
AI Provider 600s AI model calls (long-running)
Mission Execution 600s Mission phase jobs
Web Auth Separate Sidekiq Web authentication (isolated from job processing)

The circuit breaker pattern is implemented in app/services/concerns/circuit_breaker.rb.


Job Inheritance Hierarchy

Sidekiq::Job (from sidekiq gem)
    └── BaseJob (worker/app/jobs/base_job.rb)
            ├── AiAgentExecutionJob
            ├── AiMissionAnalyzeJob
            ├── AiRalphIterationJob
            ├── Devops::StepExecutionJob
            ├── Notifications::EmailDeliveryJob
            ├── ... (all 220+ jobs)
            └── Maintenance::ScheduledBackupJob

BaseJob Features

All jobs inherit from BaseJob and must implement the execute(*args) method:

class MyJob < BaseJob
  sidekiq_options queue: 'default', retry: 3

  def execute(*args)
    # Job logic here — use api_client for server communication
    result = api_client.get("/api/v1/resource/#{args[0]}")
    api_client.post("/api/v1/resource", { data: result })
  end
end

BaseJob provides:

Feature Description
api_client Pre-configured BackendApiClient instance
logger Structured logging with metadata
Runaway loop detection Detects >5 executions/minute, disables job for 5 minutes
Execution tracking Records success/failure timing in Redis
Exponential backoff Intelligent retry with API-aware delays
Idempotency helpers already_processed?(key) / mark_processed(key)
Metrics tracking increment_counter(), track_performance_metric()
API retry wrapper with_api_retry(max_attempts: 3) with retryable error detection

Retry Strategy

# Default: 3 retries with exponential backoff
sidekiq_options retry: 3

# API errors get shorter intervals: 30s, 60s, 180s
# Other errors: exponential backoff with jitter (count^4 + 15 + random)

Retryable HTTP status codes: 408, 429, 500, 502, 503, 504


Service Layer (41 files)

Core Services

Service Purpose
BackendApiClient HTTP client for server communication
ApiClient Base HTTP client
WebAuthApiClient Sidekiq Web auth client (separate circuit breaker)
LlmProxyClient AI model proxy through server LLM endpoints
BaseWorkerService Base class for worker-side services
WorkerJwt JWT token generation for service auth
PrimaryServiceAuth Standard service authentication
SystemWorkerAuth System-level worker authentication
McpSecurityService MCP credential decryption for tool execution

Domain Services

Service Purpose
EmailDeliveryWorkerService Email delivery via configured provider
EmailConfigurationService Email provider configuration
AnalyticsWorkerService Analytics processing and aggregation
AnalyticsNotificationService Analytics-based notification triggers
FileProcessingService File upload processing and virus scanning
PdfReportWorkerService PDF report generation
FirebaseService Firebase push notification delivery
TwilioService SMS delivery via Twilio
AiErrorTrackingService AI execution error classification and tracking

DevOps Services (16 files)

Category Files Purpose
devops/deployment_service.rb 1 Deployment execution
devops/git_operations_service.rb 1 Git operations
devops/git_providers/ 5 Base, Gitea, GitHub, GitLab providers + factory
devops/git_providers/webhook_normalizer.rb 1 Cross-provider webhook normalization
devops/step_handlers/ 12 Pipeline step handlers (checkout, deploy, create PR, run command, etc.)

Shared Concerns

Concern Purpose
concerns/circuit_breaker.rb Circuit breaker pattern implementation
concerns/distributed_lock.rb Redis-based distributed locking

Job Concerns (12 files)

Concern Purpose
ai_jobs_concern.rb Common AI job helpers
ai_llm_proxy_concern.rb LLM proxy integration
ai_cost_calculation_concern.rb AI cost tracking
a2a_artifact_extraction_concern.rb A2A protocol artifact handling
chat_streaming_concern.rb Chat response streaming
chat_fallback_providers_concern.rb Chat provider fallback logic
health_check_steps_concern.rb Health check step definitions
health_data_fetchers_concern.rb Health data collection
metrics_tracking.rb Metrics collection helpers
reports/csv_json_report_concern.rb CSV/JSON report generation
reports/pdf_report_concern.rb PDF report generation
reports/xlsx_report_concern.rb Excel report generation

Scheduling (sidekiq-scheduler)

The worker uses sidekiq-scheduler for cron-based job scheduling. All schedules are defined in worker/config/sidekiq.yml under the :schedule: key.

Schedule Summary

Frequency Jobs
Every minute Docker::HostSyncJob, Swarm::ClusterSyncJob
Every 5 minutes Docker::HealthCheckJob, Swarm::HealthCheckJob, Git::RunnerHealthCheckJob
Every 10 minutes AiProviderHealthCheckJob
Hourly Devops::ApprovalExpiryJob, AiProposalExpiryJob, AiBudgetRolloverJob
Every 6 hours AiProviderModelSyncJob, Compliance::AccountTerminationJob, ChatSessionCleanupJob
Daily 1-2 AM AiPricingSyncJob, AiTrustDecayJob, Maintenance::ScheduledBackupJob, Compliance::DataRetentionEnforcementJob
Daily 3-4 AM AiMemoryPoolCleanupJob, AiCompoundLearningMaintenanceJob, AiMemoryMaintenanceJob, AiTeamMessageCleanupJob, AiBudgetReconciliationJob
Daily 4-5 AM AiSharedKnowledgeMaintenanceJob, AiSkillLifecycleMaintenanceJob (daily), AiKnowledgeGraphMaintenanceJob, Maintenance::BackupCleanupJob
Daily 5 AM Swarm::EventCleanupJob, Docker::EventCleanupJob
Daily 5:30 AM AiKnowledgeDocSyncJob
Weekly (Sunday) Maintenance::ScheduledBackupJob (schema), AiSkillLifecycleMaintenanceJob (weekly)
Monthly (1st) AiSkillLifecycleMaintenanceJob (monthly)

Middleware

Sidekiq Web Authentication

File: app/middleware/sidekiq_web_auth.rb

Rack middleware for the Sidekiq Web dashboard. Authenticates users via email/password against the backend API using WebAuthApiClient.

  • Health check endpoint (/health) bypasses auth
  • Static assets bypass auth
  • Login form submits credentials to backend for validation
  • Sessions maintained via Rack session cookies

Monitoring & Error Handling

Execution Tracking

BaseJob automatically tracks in Redis:

  • Execution timestamps: job_executions:{key} — last 20 per job
  • Success markers: job_success:{key} — 5-minute TTL
  • Failure records: job_failures:{key} — last 10 failures, 1-hour TTL

Runaway Loop Detection

BaseJob detects and prevents runaway job loops:

  • 5 executions in 1 minute → job disabled for 5 minutes

  • 15 executions in 5 minutes → 5-second delay injected

  • Disabled jobs stored in job_disabled:{key} with reason

Metrics

Jobs record metrics to Redis (job_metrics:{name}):

  • Counter metrics (execution counts, error counts)
  • Gauge metrics (performance timing, cleanup stats)
  • Retained for 24 hours, last 1000 entries per metric

Structured Logging

log_info("Processing mission phase", mission_id: id, phase: "executing")
log_error("Execution failed", exception, mission_id: id)
log_warn("Rate limit approaching", remaining: 10)

Configuration

Environment Variables

Variable Default Description
WORKER_ENV development Worker environment
REDIS_URL redis://localhost:6379/1 Redis for Sidekiq
WORKER_CONCURRENCY 5 Thread count
BACKEND_API_URL http://localhost:3000 Backend API endpoint
WORKER_SERVICE_TOKEN (configured) JWT signing secret

Sidekiq Configuration

concurrency: 5        # Override with WORKER_CONCURRENCY
timeout: 300           # 5 minutes global
retry: 3               # Default retry count
poll_interval: 5/15    # Dev/prod polling interval
dead_timeout: 1209600  # 2 weeks dead job retention

Key Files

File Purpose
worker/config/sidekiq.yml Queue config, scheduling, Redis settings
worker/app/jobs/base_job.rb Base job class (all jobs inherit)
worker/app/services/backend_api_client.rb Primary API client
worker/app/services/llm_proxy_client.rb AI model proxy client
worker/app/middleware/sidekiq_web_auth.rb Web dashboard auth
worker/app/services/concerns/circuit_breaker.rb Circuit breaker implementation
worker/app/services/concerns/distributed_lock.rb Distributed locking

See Also