Add production storage guidance for Scheduler PVs: premium SSD tier and multi-zone storage class

**What content needs to be created or modified?**

The Dapr Scheduler persistence documentation (`kubernetes-persisting-scheduler.md`) gives sizing guidance (64 GiB recommended) but does not address two production-critical properties of the underlying storage:

1. **Performance characteristics (IOPS / latency).** The Scheduler's embedded etcd is sensitive to disk latency. Standard-tier storage classes can produce `leader failed to send out heartbeat on time; took too long, leader is overloaded likely from slow disk` warnings and degraded reminder dispatch behaviour under sustained write load.
2. **Zone affinity.** Many cloud default StorageClasses pin PVs to a single availability zone (e.g. Azure `StandardSSD_LRS` with `volumeBindingMode: WaitForFirstConsumer`). When a node upgrade or zonal disruption requires the Scheduler pod to reschedule to a different zone, the PV cannot follow, blocking Scheduler recovery until the original zone is available again.

Operators following the existing guidance with cloud defaults can deploy a Scheduler that has predictable disk-latency-driven instability and is unable to recover cleanly during node upgrades or DR / zone-failure events.

**Describe the solution you'd like**

In the Storage class section of the Scheduler persistence page, add explicit production guidance:
- Recommend premium / SSD-backed storage classes for production deployments (to give etcd the IOPS and latency profile it expects).
- Recommend storage classes that support multi-zone failover (zone-redundant or regional persistent disks) so Scheduler PVCs are not locked to a single AZ.
- Briefly explain the failure modes when the above are not followed: slow-disk heartbeat warnings and inability to reschedule across zones during cluster upgrades / zone events.

**Where should the new material be placed?**

`daprdocs/content/en/operations/hosting/kubernetes/kubernetes-persisting-scheduler.md`, in the Storage class subsection following the existing 64 GiB sizing note. This keeps all storage-related guidance grouped together.

**The associated pull request from dapr/dapr, dapr/components-contrib, or other Dapr code repos**

Docs-only change — implementing PR: https://github.com/dapr/docs/pull/5179

**Additional context**

This guidance reflects real-world operational signal from production deployments:
- Scheduler etcd logging recurring `slow disk` heartbeat warnings under sustained reminder/job load on Standard-tier storage classes.
- Zone-pinned scheduler PVCs blocking recovery during rolling node upgrades, leaving the Scheduler unable to reschedule until the original zone became available again — extending downtime windows by hours in observed cases.
- DR exercises failing because the scheduler PV could not be relocated to a healthy zone.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add production storage guidance for Scheduler PVs: premium SSD tier and multi-zone storage class #5180

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Add production storage guidance for Scheduler PVs: premium SSD tier and multi-zone storage class #5180

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions