feat(auto scaling): implement NiFi auto-scaling with graceful node decommissioning by soenkeliebau · Pull Request #915 · stackabletech/nifi-operator

soenkeliebau · 2026-03-25T11:55:44Z

Summary

Adds HPA-driven auto-scaling for NiFi clusters with graceful node decommissioning via the
NiFi REST API. Scaling is configured per role group through the new ReplicasConfig enum,
and the operator manages StackableScaler and HPA resources as implementation details.

NifiScalingHooks -- implements the ScalingHooks trait with version-aware
decommissioning sequences:
- NiFi 1.x: CONNECTED -> OFFLOADING -> OFFLOADED -> DISCONNECTING -> DISCONNECTED -> DELETE
- NiFi 2.x: CONNECTED -> DISCONNECTING -> DISCONNECTED -> OFFLOADING -> OFFLOADED -> DELETE
- Scale-up is a no-op (NiFi nodes self-register on startup)
NifiApiClient -- authenticated REST API client for NiFi cluster management:
- SingleUser credential resolution from Kubernetes Secrets
- Bearer token authentication
- Endpoints: /controller/cluster (list nodes), /controller/cluster/nodes/{id}
  (set status, delete node)
ReplicasConfig-based reconcile -- replaces the old integer-based replicas field:
- Fixed(n): static replica count, no scaler/HPA created
- Hpa(config): creates StackableScaler + HPA, runs state machine on each reconcile
- ExternallyScaled: creates StackableScaler without HPA for user-managed scaling
- Auto: returns explicit "not yet implemented" error
Replicas preservation -- reads existing StackableScaler's spec.replicas before
rebuilding to prevent overwriting HPA-managed values with initial defaults
Watch registration -- .owns() for both StackableScaler and
HorizontalPodAutoscaler so changes trigger NiFi cluster reconciliation
RBAC -- full CRUD on stackablescalers and stackablescalers/status in
autoscaling.stackable.tech, plus full CRUD on horizontalpodautoscalers in autoscaling
Documentation -- comprehensive auto-scaling guide covering configuration, status
inspection, scale-down behavior, failure recovery, and current limitations

fixes stackabletech/issues#667

User-facing configuration

apiVersion: nifi.stackable.tech/v1alpha1
kind: NifiCluster
spec:
  nodes:
    roleGroups:
      default:
        replicas:
          hpa:
            maxReplicas: 10
            minReplicas: 3
            metrics:
              - type: Resource
                resource:
                  name: cpu
                  target:
                    type: Utilization
                    averageUtilization: 80

The operator creates the StackableScaler and HPA automatically. Users never interact with
these resources directly.

Authentication

Only SingleUser authentication is currently supported for the NiFi REST API calls during
scaling. LDAP and OIDC configurations return an explicit UnsupportedScalerAuthentication
error. This limitation is documented and will be addressed in a follow-up.

Dependencies

operator-rs: StackableScaler CRD, ReplicasConfig, ScalingHooks trait,
reconcile_scaler(), builder helpers
(see feat(auto scaling): add StackableScaler CRD, state machine, ReplicasConfig, and scaling hooks operator-rs#1181)
commons-operator: CRD installation and admission webhook
(see feat (auto scale) : add StackableScaler CRD rollout and admission webhook commons-operator#411 -- needed for testing,
as it installs the StackableScaler CRD and admission webhook)
Uses local path patch to ../operator-rs/crates/stackable-operator (development dependency)
New runtime dependency: reqwest 0.12 (rustls-tls, json) for NiFi REST API calls

Test plan

Author

Changes are OpenShift compatible
CRD changes approved
CRD documentation for all fields, following the style guide.
Helm chart can be installed and deployed operator works
Integration tests passed (for non trivial changes)
Changes need to be "offline" compatible
Links to generated (nightly) docs added
Release note snippet added

Reviewer

Code contains useful comments
Code contains useful logging statements
(Integration-)Test cases added
Documentation added or updated. Follows the style guide.
Changelog updated
Cargo.toml only contains references to git tags (not specific commits or branches)

Acceptance

Feature Tracker has been updated
Proper release label has been added
Links to generated (nightly) docs added
Release note snippet added
Add type/deprecation label & add to the deprecation schedule
Add type/experimental label & add to the experimental features tracker

…ration Add NiFi-specific scaling hooks that drive node offload, disconnect, and deletion via the NiFi REST API before the StatefulSet replica count is reduced. Supports both NiFi 1.x (offload-first) and 2.x (disconnect-first) scale-down sequences. Key components: - NifiScalingHooks implementing the ScalingHooks trait - NifiApiClient for REST API calls (connect, cluster nodes, status updates) - Credential resolution from Kubernetes Secrets - Controller integration with StackableScaler reconciliation - RBAC, Helm config, and generated files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Document how to configure horizontal auto-scaling for NiFi clusters using StackableScaler and HPA, including configuration steps, status inspection, scale-down decommission behavior, failure recovery via the retry annotation, and current limitations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Explain that the label is required, auto-injected by the mutating webhook in commons-operator, and harmless to set explicitly in manifests for clarity. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace replicas: 0 convention with ReplicasConfig enum matching. Create StackableScaler and HPA via ClusterResources.add(). Switch from .watches() to .owns() for scaler event routing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…update Part of the ReplicasConfig rewrite: replace the raw HorizontalPodAutoscalerSpec wrapper in HpaConfig with a custom struct exposing only user-relevant fields, add replicas preservation support, and align Label::managed_by usage with ClusterResources conventions. - Preserve existing StackableScaler spec.replicas across reconciles by reading the current value via `client.get_opt::<StackableScaler>()` before building the scaler object. This prevents server-side apply from resetting externally-set replica counts (e.g. from the HPA). - Pass `NIFI_CONTROLLER_NAME` to `build_scaler()` and `build_hpa_from_user_spec()` for correct `managed-by` label format. - Register `.owns::<HorizontalPodAutoscaler>()` on the controller so that HPA changes trigger reconciliation. - Add dedicated error variants (BuildHpa, GetExistingScaler, ApplyScaler, ApplyHpa) instead of reusing ApplyRoleGroupStatefulSet for scaler/HPA operations. - Update RBAC roles to include create/delete/update verbs for StackableScaler and full CRUD for HorizontalPodAutoscaler. - Update `hpa_config.spec` call sites to `hpa_config.as_ref()` to match the new flat HpaConfig struct from operator-rs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

soenkeliebau and others added 6 commits March 11, 2026 09:39

Document stackable.tech/cluster-kind label on StackableScaler

7cead3d

Explain that the label is required, auto-injected by the mutating webhook in commons-operator, and harmless to set explicitly in manifests for clarity. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Added scaling functionality for trino-operator

7eac291

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(auto scaling): implement NiFi auto-scaling with graceful node decommissioning#915

feat(auto scaling): implement NiFi auto-scaling with graceful node decommissioning#915
soenkeliebau wants to merge 6 commits into
mainfrom
feat/autoscale

soenkeliebau commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Uh oh!

Conversation

soenkeliebau commented Mar 25, 2026

Summary

User-facing configuration

Authentication

Dependencies

Test plan

Author

Reviewer

Acceptance

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant