Powernode's system extension. Node lifecycle, modules, SDWAN, fleet autonomy, container runtimes, disk image CI, and the on-node Go agent.
This file is the index for AI sessions touching extensions/system/. Each domain points at its operator guide + critical source files.
| Domain | Operator Guide | Key Source Files |
|---|---|---|
| Node lifecycle | docs/ARCHITECTURE.md §2 |
app/models/system/{node,node_instance,node_template,node_architecture,node_platform}.rb, app/services/system/{enrollment,bootstrap,provisioning,instance_control}_service.rb |
| Modules + categories + assignments | docs/ARCHITECTURE.md §1 |
app/models/system/{node_module,node_module_category,node_module_assignment,node_module_version}.rb, app/services/system/{module_version,module_build,module_publication_processor,module_oci_ingest}_service.rb |
| Container runtimes (Phase 1 Docker + Phase 2 K3s) | docs/CONTAINER_RUNTIMES.md |
app/services/system/docker_daemon_provisioner_service.rb, app/services/system/kubernetes_cluster_provisioner_service.rb, app/controllers/api/v1/system/node_api/runtime_controller.rb, agent/internal/dockerd/, agent/internal/k3sd/ |
| SDWAN (slices 1–9) | docs/ARCHITECTURE.md §5 |
app/models/sdwan/, app/services/sdwan/, app/controllers/api/v1/system/sdwan/ |
| Fleet autonomy + sensors | docs/FLEET_SENSORS.md, docs/ARCHITECTURE.md §4 |
app/services/system/fleet/sensors/, app/services/fleet_autonomy_service.rb, db/seeds/fleet_autonomy_agent.rb |
| Skill executors | docs/SKILL_EXECUTORS.md |
app/services/system/ai/skills/ (40 executors), db/seeds/system_skills_seed.rb |
| Disk image CI | docs/DISK_IMAGE_CI.md |
app/models/system/{disk_image_publication,disk_image_webhook}.rb, app/services/system/disk_image_*_service.rb |
| CI workers + Gitea Actions | (cross-cuts disk image CI) | app/services/system/{worker_dispatch,execution_dispatcher}.rb |
| Tasks + autonomy reconcile | docs/ARCHITECTURE.md §4 |
app/models/system/task.rb, app/services/system/runtime_task_dispatcher.rb |
| Honeypot canaries | docs/ARCHITECTURE.md §7 |
app/services/system/honeypot/canary_module_service.rb |
The system extension seeds seven AI agents with distinct trust scores + approval chains. The 2026-05-10 split brought Concierge + Fleet Autonomy + Runtime Manager + CVE Responder + SDWAN Manager + Disk Image Manager — replacing an earlier 3-agent model where Fleet Autonomy owned CVE, SDWAN, and Disk Image work. Phase O6 then added System Topology Designer as the first specialist in the cross-cutting design track. Each domain has its own queue so operators can pause one (e.g. SDWAN during maintenance) without halting the others. Note: Fleet Autonomy's seed file is db/seeds/fleet_autonomy_agent.rb (no system_ prefix — predates the naming convention); the other six follow db/seeds/system_<name>_agent.rb.
- System Concierge (
assistant, chat) — operator chat agent.concierge_tool_filtercoverssystem_*,docker_*,kubernetes_*, plusdiscover_skills/get_skill_context/request_confirmation. 4 read-shape skills bound. Seeded bydb/seeds/system_concierge_agent.rb. - Fleet Autonomy (
monitor) — non-CVE fleet reconciler running every 60s. Cert rotation, drift remediation, module composition, rolling upgrades, package repository/module ops. 10 skills bound. ~13 intervention policies (CVE policies moved to CVE Responder). Seeded bydb/seeds/fleet_autonomy_agent.rb. - Runtime Manager (
monitor) — Phase 1 Docker + Phase 2 K3s lifecycle. 2 skills bound (docker_provision,provision_cluster). 8 intervention policies. Distinct approval chain so container runtime changes route separately. Seeded bydb/seeds/system_runtime_manager_agent.rb. - CVE Responder (
monitor) — security-focused reconciler running every 60s viaSystemCveResponderReconcileJob. Owns the full chain: CVE ingest (via hourlySystemCveFeedJob) → exposure scan → triage → critical-upgrade detection → orchestrated rebuild + rolling upgrade. 5 skills bound (cve_response,cve_remediation_orchestration,cve_runbook_generate,rolling_module_upgrade,package_module_refresh). 5 intervention policies. 8h approval timeout (security responses span business days). Seeded bydb/seeds/system_cve_responder_agent.rb. Sensors live inapp/services/system/cve_ops/sensors/:CvePublishedSensoremitssystem.cve_critical_publishedfor fresh critical/high exposures;CriticalUpgradeAvailableSensoremitssystem.module_critical_upgrade_readyonly when drift AND open CveExposure intersect (the "patch already exists, fly it" path which getsnotify_and_proceed). - SDWAN Manager (
monitor) — owns SDWAN peer drift, hub reachability, BGP session health, VIP failover, route policy audit, and operator-initiated SDWAN CRUD. 28 intervention policies; 4h approval timeout. Skills bound:sdwan_*reconciliation executors. Seeded bydb/seeds/system_sdwan_manager_agent.rb(2026-05-10). Operator guide:docs/SDWAN_MANAGER_AGENT.md. - Disk Image Manager (
monitor) — owns disk image CI publication lifecycle (build → verify → promote → retention). 6 intervention policies; 12h approval timeout; 5-minute tick. Seeded bydb/seeds/system_disk_image_manager_agent.rb(2026-05-10). Operator guide:docs/DISK_IMAGE_MANAGER_AGENT.md. For the upstream CI pipeline seedocs/DISK_IMAGE_CI.md. - System Topology Designer (
assistant) — specialist agent for cross-cutting platform topology design (Phase O6, first specialist in the cross-cutting design track). Charter: SDWAN composition today (host bridges, OVN logical networks, IPFIX collectors); container networking + storage topology in future. Invoked by Concierge viaexecute_agentfor topology composition. 5 compose skills bound:system-sdwan-host-bridge-compose,system-sdwan-ovn-compose-topology,system-sdwan-ipfix-collector-compose,system-sdwan-compose-full-topology,system-sdwan-ovn-apply-acl. Trust tier: monitored. Seeded bydb/seeds/system_topology_designer_agent.rb.
System-extension MCP actions follow these prefixes:
system_*— fleet ops, modules, instances, templates, tasks, container runtime provisioning, disk image CIsystem_sdwan_*— SDWAN management (~70 actions)kubernetes_*— Phase 2 K8s clusters (read + decommission + kubeconfig)docker_*— DockerHost CRUD + container/image/network/volume management (works on managed + external hosts)
The full action catalog regenerates via cd server && bundle exec rails mcp:generate_tool_catalog (gitignored at docs/platform/MCP_TOOL_CATALOG.md).
- Always check existing skill executors before writing a new orchestration. 40 already cover most fleet/SDWAN/runtime/topology workflows. See
docs/SKILL_EXECUTORS.md. - New skills must have BOTH an executor at
app/services/system/ai/skills/<name>_executor.rbAND anAi::Skillrecord (seeded viadb/seeds/system_skills_seed.rb). - New autonomy actions must have a
system.<action>intervention policy entry in eitherfleet_autonomy_agent.rborsystem_runtime_manager_agent.rb. - Cross-account safety: use
find_or_create_bywithaccount: accountscoping. The KG seeds + skill seeds follow this pattern.
This is a git submodule. Per root CLAUDE.md:
- Always run
git rev-parse --show-toplevelbeforegit add/commit - Commit inside the submodule first, then bump the parent's submodule pointer
- The system extension is dual-remoted:
origin= private Gitea,github= public GitHub mirror (MIT)
- RSpec specs under
server/spec/ - Live smoke tests under
server/db/seeds/smoke_test_*.rb— run viacd server && rails runner "load Rails.root.join('../extensions/system/server/db/seeds/smoke_test_<name>.rb')" - Go agent tests under
agent/internal/*/— run viacd agent && go test ./...
README.md— extension overviewCONTRIBUTING.md— submodule + commit workflowdocs/ARCHITECTURE.md— 8 subsystems + 4 API surfaces + security architecturedocs/SMOKE_TEST.md— platform-level smoke catalog (16 seeded scripts, 7 passes: boot, container runtimes, SDWAN, federation, ACME, storage, credentials)docs/CONTAINER_RUNTIMES.md— Phase 1 Docker + Phase 2 K3s operator guide + troubleshootingdocs/USE_CASE_MATRIX.md— what works / what doesn't / what to expect for 10 NodeInstance container use cases (READ FIRST when designing a deployment)docs/SKILL_EXECUTORS.md— 40 executor reference;docs/SKILL_EXECUTOR_CATALOG.mdis the auto-generated catalog (regenerate viarails system:skills:generate_catalog— never hand-edit)docs/FLEET_SENSORS.md— 12 sensor reference + intervention policy tabledocs/DISK_IMAGE_CI.md— webhook + CI worker workflowdocs/MCP_API_REFERENCE.md—system_*/system_sdwan_*/kubernetes_*/docker_*MCP tool actionsdocs/agent-peering.md— NodeInstance-as-Agent patterndocs/credential-restoration.md— Vault credential lifecycledocs/gitops.md— GitOps reconciler designdocs/history/— archived phase plans + acceptance reports (TASKS, missing-features, federation phase-reports)initramfs/README.md— multi-arch boot builder
See docs/runbooks/README.md for the full index (audience + prereqs + runtime per runbook). Current set:
node-provisioning.md— full Node + NodeInstance lifecycle with per-state error recoverysdwan-network-setup.md— SDWAN end-to-end (networks, peers, VIPs, firewall, BGP, federation)module-authoring.md— author + register + sign + publish a new NodeModulecve-response.md— full CVE response workflow (SBOM-aware matching, triage, remediation)gitops-reconciliation.md— operator GitOps reconciler workflow (Phase A4)acme-issuance.md— ACME DNS-01 cert lifecycle (Phase A4)acme-smoke.md— P2.5.7 acceptance smoke testinstance-pool-tuning.md— pool sizing + reaping (slice 7)multi-cluster-k3s.md— multi-cluster K3s withmetadata.target_cluster_id+ HA control planedisk-image-ci.md— disk image CI operator workflowfederation-setup.md— multi-region/multi-account federation peeringfederation-troubleshooting.md— diagnostic procedures for federation failuresdocker-compose-cutover.md— legacy compose → Powernode migrationvault-credential-restoration.md— DR runbook for credential restoration
12 numbered, dependency-aware tutorials covering the full operator surface:
01-first-boot.md— single-node QEMU boot end-to-end02-first-module.md— author + sign + publish a custom module03-docker-runtime.md— Phase 1 Docker daemon provisioning04-k3s-cluster.md— Phase 2 K3s cluster with VIP-backed api_endpoint05-multi-cluster-k3s.md— multi-cluster + SDWAN isolation06-rolling-upgrade.md— batched module upgrades with circuit breaker07-cve-response.md— full CVE response pipeline (drill)08-instance-pool.md— pre-warmed pools for bursty workloads09-honeypot-canary.md— decoy assets + intervention policy10-gitops-fleet.md— fleet.yaml declarative state + reconciler11-federation.md— multi-region federation, spawn modes, P9.x guarantees12-disk-image-ci.md— custom NodePlatform via CI-published OCI artifacts
Start with docs/tutorials/INDEX.md for a Mermaid decision tree mapping operator goal → starting tutorial.
<parent>/docs/system/threat-model.md— STRIDE threat analysis across 6 attack surfaces (operator API, worker API, node API, MCP tools, internal CA, GitHub mirror)