Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@ Use this field only in case of
## Environment variables

!!!important

Environment variables reserved for operator usage (names starting with `PG` or
`CNP_`, plus `POD_NAME`, `NAMESPACE`, and `CLUSTER_NAME`) cannot be set
through the `env` and `envFrom` fields and are rejected at admission time.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -379,6 +379,7 @@ The operator manages most of the [configuration options for PgBouncer](https://w
allowing you to modify only a subset of them.

!!!warning

The operator passes these settings directly to PgBouncer without validation.
To prevent configuration errors or crash loops, ensure each parameter is
supported by your specific PgBouncer image version.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,11 @@ data:
, state
, usename
, COALESCE(application_name, '') AS application_name
, COUNT(*)
, COALESCE(EXTRACT (EPOCH FROM (max(now() - xact_start))), 0) AS max_tx_secs
, pg_catalog.count(*)
, COALESCE(EXTRACT (EPOCH FROM (pg_catalog.max(pg_catalog.now() OPERATOR(pg_catalog.-) xact_start))), 0) AS max_tx_secs
FROM pg_catalog.pg_stat_activity
GROUP BY datname, state, usename, application_name
) sa ON states.state = sa.state
) sa ON states.state OPERATOR(pg_catalog.=) sa.state
WHERE sa.usename IS NOT NULL
metrics:
- datname:
Expand All @@ -55,10 +55,10 @@ data:

backends_waiting:
query: |
SELECT count(*) AS total
SELECT pg_catalog.count(*) AS total
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_locks blocking_locks
ON blocking_locks.locktype = blocked_locks.locktype
ON blocking_locks.locktype OPERATOR(pg_catalog.=) blocked_locks.locktype
AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database
AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
Expand All @@ -68,8 +68,8 @@ data:
AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
AND blocking_locks.pid OPERATOR(pg_catalog.<>) blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid OPERATOR(pg_catalog.=) blocking_locks.pid
WHERE NOT blocked_locks.granted
metrics:
- total:
Expand Down Expand Up @@ -110,14 +110,14 @@ data:
pg_replication:
query: "SELECT CASE WHEN (
NOT pg_catalog.pg_is_in_recovery()
OR pg_catalog.pg_last_wal_receive_lsn() = pg_catalog.pg_last_wal_replay_lsn())
OR pg_catalog.pg_last_wal_receive_lsn() OPERATOR(pg_catalog.=) pg_catalog.pg_last_wal_replay_lsn())
THEN 0
ELSE GREATEST (0,
EXTRACT(EPOCH FROM (now() - pg_catalog.pg_last_xact_replay_timestamp())))
EXTRACT(EPOCH FROM (pg_catalog.now() OPERATOR(pg_catalog.-) pg_catalog.pg_last_xact_replay_timestamp())))
END AS lag,
pg_catalog.pg_is_in_recovery() AS in_recovery,
EXISTS (TABLE pg_stat_wal_receiver) AS is_wal_receiver_up,
(SELECT count(*) FROM pg_catalog.pg_stat_replication) AS streaming_replicas"
EXISTS (TABLE pg_catalog.pg_stat_wal_receiver) AS is_wal_receiver_up,
(SELECT pg_catalog.count(*) FROM pg_catalog.pg_stat_replication) AS streaming_replicas"
metrics:
- lag:
usage: "GAUGE"
Expand Down Expand Up @@ -165,17 +165,17 @@ data:
query: |
SELECT archived_count
, failed_count
, COALESCE(EXTRACT(EPOCH FROM (now() - last_archived_time)), -1) AS seconds_since_last_archival
, COALESCE(EXTRACT(EPOCH FROM (now() - last_failed_time)), -1) AS seconds_since_last_failure
, COALESCE(EXTRACT(EPOCH FROM (pg_catalog.now() OPERATOR(pg_catalog.-) last_archived_time)), -1) AS seconds_since_last_archival
, COALESCE(EXTRACT(EPOCH FROM (pg_catalog.now() OPERATOR(pg_catalog.-) last_failed_time)), -1) AS seconds_since_last_failure
, COALESCE(EXTRACT(EPOCH FROM last_archived_time), -1) AS last_archived_time
, COALESCE(EXTRACT(EPOCH FROM last_failed_time), -1) AS last_failed_time
, COALESCE(CAST(CAST('x'||pg_catalog.right(pg_catalog.split_part(last_archived_wal, '.', 1), 16) AS pg_catalog.bit(64)) AS pg_catalog.int8), -1) AS last_archived_wal_start_lsn
, COALESCE(CAST(CAST('x'||pg_catalog.right(pg_catalog.split_part(last_failed_wal, '.', 1), 16) AS pg_catalog.bit(64)) AS pg_catalog.int8), -1) AS last_failed_wal_start_lsn
, COALESCE(CAST(CAST('x' OPERATOR(pg_catalog.||) pg_catalog.right(pg_catalog.split_part(last_archived_wal, '.', 1), 16) AS pg_catalog.bit(64)) AS pg_catalog.int8), -1) AS last_archived_wal_start_lsn
, COALESCE(CAST(CAST('x' OPERATOR(pg_catalog.||) pg_catalog.right(pg_catalog.split_part(last_failed_wal, '.', 1), 16) AS pg_catalog.bit(64)) AS pg_catalog.int8), -1) AS last_failed_wal_start_lsn
, EXTRACT(EPOCH FROM stats_reset) AS stats_reset_time
FROM pg_catalog.pg_stat_archiver
predicate_query: |
SELECT NOT pg_catalog.pg_is_in_recovery()
OR pg_catalog.current_setting('archive_mode') = 'always'
OR pg_catalog.current_setting('archive_mode') OPERATOR(pg_catalog.=) 'always'
metrics:
- archived_count:
usage: "COUNTER"
Expand Down Expand Up @@ -461,12 +461,12 @@ data:
pg_extensions:
query: |
SELECT
current_database() as datname,
pg_catalog.current_database() as datname,
name as extname,
default_version,
installed_version,
CASE
WHEN default_version = installed_version THEN 0
WHEN default_version OPERATOR(pg_catalog.=) installed_version THEN 0
ELSE 1
END AS update_available
FROM pg_catalog.pg_available_extensions
Expand Down
43 changes: 43 additions & 0 deletions product_docs/docs/postgres_for_kubernetes/1/failover.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,49 @@ expected outage.
Enabling a new configuration option to delay failover provides a mechanism to
prevent premature failover for short-lived network or node instability.

## Detection of node-level failures

When the node hosting the primary becomes unreachable (for example, due to a
kubelet crash or a network partition between the node and the Kubernetes API
server), the operator relies on the pod's `Ready` condition to decide that the
primary is no longer serviceable. While the node is healthy the kubelet keeps
that condition up to date from the readiness probe; once the node stops
reporting, the Kubernetes node lifecycle controller is the one that flips the
condition to `False` as soon as it declares the node `Unknown`.

With stock kube-controller-manager settings, the transition is governed by
`--node-monitor-grace-period` (default `40s` on Kubernetes 1.29-1.31, raised
to `50s` in 1.32 and later): after that window the controller marks the node
`Unknown` and, in the same monitoring pass, issues a patch per pod on that
node to flip the `Ready` condition. In practice the operator observes the
primary as unready about **40 to 55 seconds** after the node becomes
unreachable (the grace period plus up to one `--node-monitor-period` poll,
default `5s`). Managed Kubernetes distributions (GKE, EKS, AKS) may tune
these values; consult the provider's documentation if the observed timing
does not match. After that, the failover procedure starts (further gated by
`.spec.failoverDelay`).

The `Ready` condition flip is not subject to the rate limiters that throttle
pod *eviction* during partial-zonal or large-cluster disruptions
(`--node-eviction-rate`, `--secondary-node-eviction-rate`,
`--unhealthy-zone-threshold`). The operator reacts to the condition flip as
soon as the controller emits the patch, regardless of the zone or cluster-wide
health state.

Pod *eviction* (actual deletion from the unreachable node) is a separate
mechanism, driven by `tolerationSeconds` on the
`node.kubernetes.io/unreachable` `NoExecute` taint (`300s` by default). That
timer does not hold up the operator's failover decision; {{name.ln}}
promotes a new primary as soon as the `Ready` condition flips. By that point
the kubelet on the isolated node has already stopped the old PostgreSQL
container locally: with the default
`.spec.probes.liveness.isolationCheck.enabled: true`, the instance manager
fails its own liveness probe once it can reach neither the API server nor
the rest of the cluster, and the kubelet kills the container within
approximately three probe periods (`~30s`). Full high availability
(recreation of the old primary on a healthy node by the operator) is still
gated on the taint-based eviction actually deleting the pod.

## Failover Quorum (Quorum-based Failover)

Failover quorum is a mechanism that enhances data durability and safety during
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@ Use this annotation **with extreme caution** and only during emergency
operations.

!!!warning

This annotation should be removed as soon as the issue is resolved. Leaving
it in place prevents the operator from managing the annotated resource. On a
Cluster, this includes self-healing actions and failover.
Expand Down
32 changes: 32 additions & 0 deletions product_docs/docs/postgres_for_kubernetes/1/image_catalog.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ Both resources share a common schema:
PostgreSQL 18+ via `extension_control_path`).

!!!warning

While the operator trusts the user-defined `major` version without performing
image detection, the official {{name.ln}} catalogs are pre-validated by the
community to ensure that every extension and operand image entry correctly
Expand Down Expand Up @@ -132,6 +133,36 @@ API schema and structure.
Clusters referencing an image catalog can load any of its associated extensions
by name.

!!!info

Refer to the [documentation of image volume extensions](imagevolume_extensions.md)
for details on the internal image structure, configuration options, and
instructions on how to select or override catalog extensions within a cluster.
!!!

[Image Volume Extensions](imagevolume_extensions.md) allow you to bundle
containers for extensions directly within the catalog entry:

```yaml
apiVersion: postgresql.k8s.enterprisedb.io/v1
kind: ImageCatalog
metadata:
name: postgresql
spec:
images:
- major: 18
image: docker.enterprisedb.com/k8s_enterprise/postgresql:18.3-minimal-ubi9
extensions:
- name: foo
image:
reference: # registry path for your `foo` extension image
```

The `extensions` section follows the [`ExtensionConfiguration`](pg4k.v1.md#extensionconfiguration)
API schema and structure.
Clusters referencing an image catalog can load any of its associated extensions
by name.

!!!info
Refer to the [documentation of image volume extensions](imagevolume_extensions.md)
for details on the internal image structure, configuration options, and
Expand All @@ -158,6 +189,7 @@ release (e.g., `trixie`). It lists the most up-to-date container images for
every supported PostgreSQL major version.

!!!important

To ensure maximum security and immutability, all images within official
{{name.ln}} catalogs are identified by their **SHA256 digests** rather than
just tags.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ These images are built on top of the
[official PostgreSQL `minimal` images](https://github.com/enterprisedb/docker-postgres?tab=readme-ov-file#minimal-images).

!!!info

While this documentation provides the necessary technical specifications for
third parties to build their own images and catalogs, the following
instructions focus specifically on the deployment and usage of our official
Expand All @@ -56,6 +57,7 @@ overhead by maintaining an immutable, minimal base image for your data
workloads.

!!!important

Extension images must be built according to the [documented specifications](#image-specifications).
!!!

Expand All @@ -81,6 +83,7 @@ An extension image can be added to a new or existing `Cluster` resource using
the `.spec.postgresql.extensions` stanza.

!!!important

When a new extension is added to a running `Cluster`, {{name.ln}} will
automatically trigger a [rolling update](rolling_update.md) to attach the new
image volume to each pod. Before adding a new extension in production,
Expand All @@ -100,6 +103,7 @@ The `extensions` stanza accepts a list of entries, each requiring a `name` that
must be unique within the cluster.

!!!important

The `name` must consist of lowercase alphanumeric characters, underscores (`_`)
or hyphens (`-`) and must start and end with an alphanumeric character.
!!!
Expand Down Expand Up @@ -153,6 +157,7 @@ official {{name.ln}} catalogs pre-configure these options to the correct
values for each extension, ensuring they work out-of-the-box.

!!!important

If an extension image includes shared libraries, they must be compiled for the
same PostgreSQL major version, operating system distribution, and CPU
architecture as the operand image. Using official {{name.ln}} catalogs
Expand Down Expand Up @@ -194,6 +199,7 @@ ensures your desired state is maintained and consistently applied across all
instances.

!!!note

Some PostgreSQL components, often referred to as modules, do not use the
`CREATE EXTENSION` mechanism. These typically consist of shared libraries that
must be loaded via `shared_preload_libraries` at server start.
Expand All @@ -213,6 +219,7 @@ production-ready supply chain.
### Via an Image Catalog (Recommended)

!!!info

Support for extension container images in image catalogs was introduced in
{{name.ln}} 1.29.
!!!
Expand Down Expand Up @@ -256,6 +263,7 @@ PostgreSQL operand image defined in the same catalog entry.
### Directly in the Cluster

!!!info

Defining extensions directly in the `Cluster` resource is the original method
and remains the only option for versions prior to {{name.ln}} 1.29.
It is also useful if you need to use an extension not present in your current
Expand Down Expand Up @@ -283,6 +291,7 @@ spec:
```

!!!tip

Remember that configuration provided directly in the `Cluster` takes
precedence. If you reference a catalog but also define the same extension name
in the `Cluster` stanza, the settings in the `Cluster` will override those in
Expand All @@ -293,6 +302,7 @@ can be overridden at the `Cluster` level to provide total flexibility, the
!!!

!!!warning

The `name` serves as the unique identifier; changing it will define a new
extension entry rather than overriding an existing one from a catalog.
!!!
Expand Down Expand Up @@ -491,6 +501,7 @@ spec:
system libraries at runtime.

!!!important

Since `ld_library_path` must be set when the PostgreSQL process starts,
changing this value requires a **cluster restart** for the new value to take
effect.
Expand Down Expand Up @@ -531,6 +542,7 @@ variable of the Postgres process, allowing PostgreSQL to locate these
binaries at runtime.

!!!warning

Since `bin_path` must be set when the PostgreSQL process starts,
changing this value requires a **cluster restart** for the new value to take
effect.
Expand Down Expand Up @@ -587,6 +599,7 @@ In the example above, if the extension is mounted at
dependencies regardless of the specific mount path chosen by the operator.

!!!tip

Unrecognized placeholders (e.g., `${typo}`) are rejected at admission time.
If you need a literal `${...}` in a value, escape it by doubling the dollar
sign: `$${...}`. For example, a value of `$${not_expanded}` will produce the
Expand All @@ -611,6 +624,7 @@ environment variable overwrites a value that was already set by a previous
extension, to help diagnose potential conflicts.

!!!important

**Reserved variables**: Environment variables reserved for operator usage
(names starting with `PG` or `CNP_`, plus `POD_NAME`, `NAMESPACE`, and
`CLUSTER_NAME`) and variables managed by dedicated fields (`PATH` via
Expand All @@ -619,6 +633,7 @@ the `env` field and are rejected at admission time.
!!!

!!!warning

**Manual Restart Required**: Because environment variables are injected when
the PostgreSQL process starts, any changes to the `env` section **require a
cluster restart**. {{name.ln}} does **not** automatically trigger a rollout
Expand All @@ -642,6 +657,7 @@ discoverable and usable by PostgreSQL within {{name.ln}} without requiring
manual configuration.

!!!important

We encourage PostgreSQL extension developers and third-party providers to
publish OCI-compliant extension images following this layout.
For practical implementation details, we recommend reviewing the
Expand Down
2 changes: 1 addition & 1 deletion product_docs/docs/postgres_for_kubernetes/1/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ description: The {{name.ln}} operator is a fork based on CloudNativePG™ which
originalFilePath: src/index.md
indexCards: none
directoryDefaults:
version: "1.29.0"
version: "1.29.1"
redirects:
- /postgres_for_kubernetes/preview/:splat
navigation:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -69,12 +69,12 @@ kubectl create secret -n postgresql-operator-system docker-registry edb-pull-sec
Now that the pull-secret has been added to the namespace, the operator can be installed like any other resource in Kubernetes,
through a YAML manifest applied via `kubectl`.

You can install the [latest operator manifest](https://get.enterprisedb.io/pg4k/pg4k-1.29.0.yaml)
You can install the [latest operator manifest](https://get.enterprisedb.io/pg4k/pg4k-1.29.1.yaml)
for this minor release as follows:

```sh
kubectl apply --server-side -f \
https://get.enterprisedb.io/pg4k/pg4k-1.29.0.yaml
https://get.enterprisedb.io/pg4k/pg4k-1.29.1.yaml
```

You can verify that with:
Expand Down Expand Up @@ -279,6 +279,22 @@ When versions are not directly upgradable, the old version needs to be
removed before installing the new one. This won't affect user data but
only the operator itself.

### Upgrading to 1.29.1, 1.28.3, or 1.25.8

Version 1.29.1, 1.28.3, and 1.25.8 ship the fix for `CVE-2026-44477` /
`GHSA-423p-g724-fr39`. The metrics exporter now authenticates as a
dedicated `cnp_metrics_exporter` role with `pg_monitor` privileges
only, instead of the `postgres` superuser.

Custom monitoring queries that read user-owned tables, or use
`target_databases: '*'` against databases where `PUBLIC` `CONNECT`
has been revoked, need explicit `GRANT` statements to
`cnp_metrics_exporter`. See ["Custom query privileges and
safety"](monitoring.md#custom-query-privileges-and-safety) and ["Manually creating
the metrics exporter
role"](monitoring.md#manually-creating-the-metrics-exporter-role) in
the monitoring documentation.

### Upgrading to 1.29.0 or 1.28.x

!!!info Important
Expand Down
Loading
Loading