Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
159 commits
Select commit Hold shift + click to select a range
9ba18b3
Merge pull request #13 from NCAR/hua-work-dbms
zaihuaji Jun 17, 2025
8284215
Merge pull request #14 from NCAR/hua-work-dbms
zaihuaji Jun 18, 2025
bdceecb
Move the cluster and service out to create secrets first
NicholasCote Jun 18, 2025
5adfeb4
update secret store name to be rda-ro
NicholasCote Jun 23, 2025
f82de55
add in the service and cluster
NicholasCote Jun 23, 2025
ed1b0b3
Update postgres settings to remove fixed parameters from cnpg
NicholasCote Jun 23, 2025
14cff9f
only need 1 instance
NicholasCote Jun 24, 2025
4a65e72
Change pgdb01 wal_level to logical
NicholasCote Jun 26, 2025
6b27400
02 to logical wal_level
NicholasCote Jun 26, 2025
b4b6558
All cirrus pgdbs to 2 instances
NicholasCote Jun 26, 2025
6b6193a
Merge pull request #15 from NCAR/hua-work-dbms
zaihuaji Jul 1, 2025
7ce1d16
Merge pull request #16 from NCAR/hua-work-dbms
zaihuaji Jul 11, 2025
3968cf0
Merge pull request #17 from NCAR/hua-work-dbms
zaihuaji Jul 11, 2025
1fced41
update postgres01
NicholasCote Jul 14, 2025
6cdf347
Change wal_level back to replica
NicholasCote Jul 18, 2025
c670016
Add replication user to eso
NicholasCote Jul 21, 2025
336b5c6
try to add replication to pgdb02
NicholasCote Jul 22, 2025
92952c0
hardcode replication username
NicholasCote Jul 22, 2025
3f96393
user to string
NicholasCote Jul 22, 2025
99fd0a9
Fix indentation for the password
NicholasCote Jul 22, 2025
a123dff
add pg_basebackup
NicholasCote Jul 22, 2025
9977d1e
Update wal senders and replication slots
NicholasCote Jul 22, 2025
afc9c8b
Reduce WAL size to see if that helps the issue
NicholasCote Jul 31, 2025
9ba1846
from 7 to 8GB
NicholasCote Jul 31, 2025
076b485
Increase WAL retention
NicholasCote Aug 4, 2025
e555f10
8 to 9
NicholasCote Aug 4, 2025
29f1a58
Scale down to 1 replica
NicholasCote Aug 4, 2025
f66278d
Add protection for the pgdb03-4 PVC
NicholasCote Aug 4, 2025
f22e112
remove the owners reference
NicholasCote Aug 4, 2025
1f68a46
remove the pgdb03 cluster
NicholasCote Aug 4, 2025
dc8a34f
Try to roll out simple cluster to get everything healthy
NicholasCote Aug 4, 2025
4b7e4c3
Switch back to the original cluster yaml with 1 instance
NicholasCote Aug 4, 2025
25fdc9b
Switch the PVC back to cluster ownership
NicholasCote Aug 4, 2025
81cfea6
remove pvc protection and trigger the PVC to be owned by CNPG cluster…
NicholasCote Aug 4, 2025
4da9706
Add the PVC back
NicholasCote Aug 4, 2025
65896c1
Try this strong deletion
NicholasCote Aug 4, 2025
5722842
remove the pvc file again
NicholasCote Aug 4, 2025
397c976
scale to 2 instances
NicholasCote Aug 4, 2025
c8d03a1
rename the cluster
NicholasCote Aug 4, 2025
67c800f
don't rename
NicholasCote Aug 4, 2025
46c6d9b
Try and fix the PVC pending deletion
NicholasCote Aug 4, 2025
6c0958c
Try to switch the PVC to remove the deletion
NicholasCote Aug 4, 2025
6a512d7
remove the cluster
NicholasCote Aug 4, 2025
5bb166d
Just try to redeploy
NicholasCote Aug 4, 2025
6512904
remove the cluster again
NicholasCote Aug 4, 2025
620b37a
Restart with a clean 03
NicholasCote Aug 4, 2025
483edba
bak
NicholasCote Aug 4, 2025
1f96432
Try and force it to start fresh
NicholasCote Aug 4, 2025
7457153
remove the skip archive check line
NicholasCote Aug 4, 2025
6678fb4
Scale to 2 instances
NicholasCote Aug 4, 2025
e12c16e
add a weekly backup
NicholasCote Aug 4, 2025
190946c
Remove snapshot as we need to install the operator
NicholasCote Aug 4, 2025
a8714cf
Add resource limits for 03 and tweak some settings to get better reso…
NicholasCote Aug 6, 2025
cf042d8
Add single quotes
NicholasCote Aug 6, 2025
d891b30
Merge pull request #18 from NCAR/hua-work-dbms
zaihuaji Aug 13, 2025
c786f91
Merge pull request #19 from NCAR/hua-work-dbms
zaihuaji Aug 13, 2025
1d7dbed
Merge pull request #20 from NCAR/hua-work-dbms
zaihuaji Aug 19, 2025
5e1e50a
Merge pull request #21 from NCAR/hua-work-dbms
zaihuaji Aug 20, 2025
6f22c62
Add weekly backup to pgdb01 now that new csi driver is in place
NicholasCote Aug 20, 2025
dbff4d3
move retention policy to cluster yaml
NicholasCote Aug 20, 2025
988f57c
8 -> 8w
NicholasCote Aug 20, 2025
8480387
Merge pull request #22 from NCAR/hua-work-dbms
zaihuaji Aug 21, 2025
c17b519
Add K8s network CIDR for replications allow
NicholasCote Aug 27, 2025
e7e43a1
Add some annotations to try ang get the DB to reconcile
NicholasCote Aug 27, 2025
7a4e209
Add annotations to right cluster
NicholasCote Aug 27, 2025
4ce9016
try to trigger a resync
NicholasCote Aug 27, 2025
776772f
Comment out the replica info for 02
NicholasCote Aug 27, 2025
e0d03b6
Uncomment the replica and bootstrap info
NicholasCote Aug 27, 2025
4e1a9d8
Remove some bloat
NicholasCote Aug 27, 2025
33a021e
Add resource limits to 01 & 02 that match 03 (16/128)
NicholasCote Aug 27, 2025
18abfbb
add | quote
NicholasCote Aug 27, 2025
96a9a8a
scale pgdb01 down to 1 instance
NicholasCote Aug 27, 2025
c8024d4
remove quotes?
NicholasCote Aug 27, 2025
02012a6
Fix the quote
NicholasCote Aug 27, 2025
bda061c
scale back up to 2
NicholasCote Aug 27, 2025
6bb885f
Update WAL settings to increase keep size
NicholasCote Aug 28, 2025
8583dfb
set to the default value
NicholasCote Aug 28, 2025
4d32a01
wal keep size to 1024
NicholasCote Aug 28, 2025
fb60485
change the backup to 02 and increase max slot wal keep size
NicholasCote Aug 28, 2025
c306816
add 1 TB to volume
NicholasCote Aug 28, 2025
e42e256
increase size and scale down to 1 instance
NicholasCote Aug 28, 2025
ae8af81
remove the backup for now and increase pgdb01 to 3 instances to clean up
NicholasCote Aug 29, 2025
a4ac68e
pgdb01 back down to 2 instances
NicholasCote Aug 29, 2025
5c970ab
pgdb01 scale down to 1
NicholasCote Aug 29, 2025
23f0107
Merge pull request #23 from NCAR/hua-work-dbms
zaihuaji Aug 31, 2025
75ab185
reenable snapshot backups on pgdb02
khrpcek-ucar Sep 2, 2025
ecce200
increase pgdb01 instances to 2
khrpcek-ucar Sep 2, 2025
ccbf3ee
Merge pull request #24 from NCAR/hua-work-dbms
zaihuaji Sep 10, 2025
d2c3543
Merge pull request #25 from NCAR/hua-work-dbms
zaihuaji Sep 10, 2025
50e5125
Merge pull request #26 from NCAR/hua-work-dbms
zaihuaji Sep 15, 2025
6801612
Merge pull request #27 from NCAR/hua-work-dbms
zaihuaji Sep 15, 2025
5b1910d
Merge pull request #28 from NCAR/hua-work-dbms
zaihuaji Sep 15, 2025
df79cd6
Merge pull request #29 from NCAR/hua-work-dbms
zaihuaji Sep 16, 2025
6606645
Merge pull request #30 from NCAR/hua-work-dbms
zaihuaji Sep 16, 2025
2417785
Merge pull request #31 from NCAR/hua-work-dbms
zaihuaji Sep 19, 2025
e0e2114
add backups external secret
khrpcek-ucar Sep 24, 2025
0d1209b
add s3 backups
khrpcek-ucar Sep 24, 2025
9f1edfd
add s3 backups
khrpcek-ucar Sep 24, 2025
61a63b4
add s3 backups
khrpcek-ucar Sep 24, 2025
2056827
add s3 backups
khrpcek-ucar Sep 24, 2025
750d43a
add s3 backups
khrpcek-ucar Sep 24, 2025
19af08c
Merge pull request #32 from NCAR/hua-work-dbms
zaihuaji Sep 29, 2025
f7ddcc4
test prometheus metrics
khrpcek-ucar Sep 29, 2025
fb47e2d
Merge branch 'main' of github.com:NCAR/rda-python-dbms
khrpcek-ucar Sep 29, 2025
3f02906
remove pgdb02 local replica because pgdb02 needs to be rebuilt
khrpcek-ucar Sep 30, 2025
7efc068
Merge pull request #33 from NCAR/hua-work-dbms
zaihuaji Sep 30, 2025
96bbde9
Merge pull request #34 from NCAR/hua-work-dbms
zaihuaji Sep 30, 2025
d134926
Merge pull request #35 from NCAR/hua-work-dbms
zaihuaji Sep 30, 2025
2c9d2ec
add backup spec to pgdb01
khrpcek-ucar Sep 30, 2025
e2db53e
Merge branch 'main' of github.com:NCAR/rda-python-dbms
khrpcek-ucar Sep 30, 2025
9048048
add backup spec to pgdb01
khrpcek-ucar Sep 30, 2025
ebb9431
update barman backup to use compression
khrpcek-ucar Oct 1, 2025
f8e7bcd
pgdb02 back to 2 instances and disable s3 backups
khrpcek-ucar Oct 9, 2025
37da59e
add pgdb04 for ml outage
khrpcek-ucar Oct 9, 2025
a2af664
typo
khrpcek-ucar Oct 9, 2025
83be26c
pgdb04 secret futz
khrpcek-ucar Oct 9, 2025
ab50eed
pgdb03: increase max wal senders
khrpcek-ucar Oct 9, 2025
0ccf75a
increase pgdb04 instances to 2
khrpcek-ucar Oct 13, 2025
d6ced0b
pgdb02 backup fixes
khrpcek-ucar Oct 14, 2025
db3a2f4
pgdb01: disable backups
khrpcek-ucar Nov 6, 2025
6c75aca
pgdb01: disable backups
khrpcek-ucar Nov 7, 2025
cd83195
pgdb02: update chart and remove barman config
khrpcek-ucar Nov 10, 2025
b7e8b20
pgdb02: remove primary stream slot name
khrpcek-ucar Nov 10, 2025
251df28
add barman plugin method
khrpcek-ucar Nov 12, 2025
cb6f1a1
pgdb02: remove legacy barman config and andd objectstore
khrpcek-ucar Nov 20, 2025
fa00b35
pgdb02: forgot this file
khrpcek-ucar Nov 20, 2025
fac40c4
pgdb02: forgot this file
khrpcek-ucar Nov 20, 2025
eb3a732
pgdb02: update s3 scheduledbackup
khrpcek-ucar Nov 20, 2025
d07c64e
pgdb02: add backup compressor
khrpcek-ucar Nov 24, 2025
c9a9914
pgdb02: give s3 backups more time before compression starts
khrpcek-ucar Nov 25, 2025
3229f5a
add retention policy to s3 backups
khrpcek-ucar Dec 3, 2025
283cad7
Merge pull request #36 from NCAR/hua-work-dbms
zaihuaji Dec 16, 2025
75cae7b
Merge pull request #37 from NCAR/hua-work-dbms
zaihuaji Dec 16, 2025
30b68bb
Merge pull request #38 from NCAR/hua-work-dbms
zaihuaji Jan 12, 2026
1242fa8
update external secret api
khrpcek-ucar Jan 29, 2026
01b21af
Merge pull request #39 from NCAR/hua-work-dbms
zaihuaji Feb 5, 2026
29ce54e
Disable S3 backups until plugin cert issue is fixed
NicholasCote Feb 6, 2026
4c07325
Update values.yaml
NicholasCote Feb 9, 2026
1fb59ad
Update postgres_cluster.yaml
NicholasCote Feb 9, 2026
a52793f
Merge pull request #40 from NCAR/hua-work-dbms
zaihuaji Feb 10, 2026
d87ae32
Merge pull request #41 from NCAR/hua-work-dbms
zaihuaji Feb 10, 2026
f90bae9
Merge pull request #42 from NCAR/hua-work-dbms
zaihuaji Feb 10, 2026
a08a351
Add alert rules for replication
NicholasCote Feb 10, 2026
f507db8
Add if s3 enabled
NicholasCote Feb 10, 2026
c254d9d
Update alerts for Go syntax for {{}} variables from grafana
NicholasCote Feb 10, 2026
ca99fcb
switch team name to gdex
NicholasCote Feb 10, 2026
40b3d85
Update alert configuration
NicholasCote Feb 12, 2026
21491af
Escape variables for Golang
NicholasCote Feb 12, 2026
a085df7
Add release label
NicholasCote Feb 12, 2026
c4974f2
Fix replication broken, this is inside a cluster, so 1 replica will t…
NicholasCote Feb 12, 2026
45a89b2
Add alerts to nwc1
NicholasCote Feb 12, 2026
281845f
Merge pull request #44 from NCAR/hua-work-dbms
zaihuaji Feb 12, 2026
6fe61e7
fix some if statements, add s3 backup to pgdb03 for testing
NicholasCote Mar 6, 2026
0b1ec37
add s3 eso
NicholasCote Mar 6, 2026
052ec80
change secret name
NicholasCote Mar 6, 2026
6623351
add null route for info alerts to decrease tds noise
NicholasCote Mar 6, 2026
355f702
add 03 s3 backup retention period and a schedule backup
NicholasCote Mar 6, 2026
2f3a7ec
put the retention policy in the right place
NicholasCote Mar 6, 2026
90fd92a
switch to gzip to stream compression instead of compressing locally.
NicholasCote Mar 6, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions pgdb01-cirrus/templates/alert-email.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
name: gdex-app-team
namespace: rda
labels:
alertmanagerConfig: gdex
namespace: rda
spec:
route:
receiver: gdex-app-team
groupBy:
- alertname
groupWait: 10s
groupInterval: 1m
repeatInterval: 60m
matchers:
- name: namespace
value: rda
matchType: "="
routes:
- receiver: "null"
matchers:
- name: alertname
value: InfoInhibitor
matchType: "="

receivers:
- name: gdex-app-team
emailConfigs:
- to: decs-info@ucar.edu
from: alertmanager@k8s.ucar.edu
smarthost: vdir.ucar.edu:25
- name: "null"
48 changes: 48 additions & 0 deletions pgdb01-cirrus/templates/alert-rule.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: gdex-pg-replication-alerts
namespace: rda
labels:
team: gdex-app-team
release: kube-prometheus-stack
spec:
groups:
- name: pg.replication
interval: 60s
rules:
- alert: PGReplicationLagHigh
expr: |
cnpg_pg_replication_lag{namespace="rda"} > 100
for: 15m
labels:
severity: warning
team: gdex-app-team
namespace: rda
annotations:
summary: "PostgresDB replication lag high on {{`{{ $labels.pod }}`}}"
description: "Replication lag is {{`{{ $value }}`}} WAL segments behind on {{`{{ $labels.pod }}`}} in cluster {{`{{ $labels.cluster }}`}}"

- alert: PGReplicationBroken
expr: |
cnpg_pg_replication_streaming_replicas{namespace="rda"} == 0
for: 5m
labels:
severity: critical
team: gdex-app-team
namespace: rda
annotations:
summary: "PostgresDB cluster replication broken for {{`{{ $labels.cluster }}`}}"
description: "Cluster {{`{{ $labels.cluster }}`}} has no streaming replicas. Replication may be broken."

- alert: PGClusterNotHealthy
expr: |
cnpg_collector_up{namespace="rda"} == 0
for: 5m
labels:
severity: critical
team: gdex-app-team
namespace: rda
annotations:
summary: "PostgresDB cluster {{`{{ $labels.cluster }}`}} is not healthy"
description: "The Postgres exporter for cluster {{`{{ $labels.cluster }}`}} is down, indicating cluster health issues"
25 changes: 25 additions & 0 deletions pgdb01-cirrus/templates/backups_external_secret.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
{{- if .Values.db.backups.s3.enabled }}
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
name: backup-s3-creds
namespace: {{ .Release.Namespace }}
spec:
data:
- remoteRef:
key: {{ .Values.db.backups.s3.secretPath }}
property: access_key
secretKey: access_key
- remoteRef:
key: {{ .Values.db.backups.s3.secretPath }}
property: secret_key
secretKey: secret_key
refreshInterval: 1h
secretStoreRef:
kind: SecretStore
name: rda-ro
target:
creationPolicy: Owner
deletionPolicy: Retain
name: {{ .Values.db.backups.s3.secretName }}
{{- end -}}
57 changes: 38 additions & 19 deletions pgdb01-cirrus/templates/postgres_cluster.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,34 @@ spec:
instances: {{ .Values.db.instances }}
storage:
size: {{ .Values.db.size }}
resources:
limits:
cpu: {{ .Values.db.resource.limits.cpu | quote }}
memory: {{ .Values.db.resource.limits.memory }}

{{- if .Values.db.backups }}
backup:
{{- if .Values.db.backups.volumeSnapshot }}
volumeSnapshot:
className: {{ .Values.db.backups.volumeSnapshot.snapshotClassName }}
{{- end }}
{{- if .Values.db.backups.s3.enabled }}
barmanObjectStore:
wal:
compression: bzip2
data:
compression: bzip2
destinationPath: {{ .Values.db.backups.s3.destinationPath }}
endpointURL: {{ .Values.db.backups.s3.endpointURL }}
s3Credentials:
accessKeyId:
name: {{ .Values.db.backups.s3.secretName }}
key: access_key
secretAccessKey:
name: {{ .Values.db.backups.s3.secretName }}
key: secret_key
{{- end }}
{{- end }}

# Add TLS certificates for encrypted communication
certificates:
Expand All @@ -22,19 +50,14 @@ spec:
# Configure postgres superuser from su_external_secret
superuserSecret:
name: "{{ .Values.db.name }}-superuser"

# Allow outside hosts to connect to the database

postgresql:
parameters:
# Connection settings
listen_addresses: "*"
port: "5432"
max_connections: "500"

# SSL Configuration
ssl: "on"
ssl_ciphers: "HIGH:!aNULL"
ssl_prefer_server_ciphers: "on"
ssl_min_protocol_version: "TLSv1.3"

# Memory settings
Expand All @@ -55,23 +78,16 @@ spec:
min_wal_size: "1GB"

# Replication settings
max_wal_senders: "3"
max_replication_slots: "3"
wal_keep_size: "256"
max_slot_wal_keep_size: "-1"
hot_standby: "on"
max_wal_senders: "6"
max_replication_slots: "6"
wal_keep_size: "1024MB"
max_slot_wal_keep_size: "100GB"
max_standby_archive_delay: "-1"
max_standby_streaming_delay: "-1"

# Logging settings
log_destination: "stderr"
logging_collector: "on"
log_directory: "log"
log_filename: "postgresql-%Y-%m-%d_%H%M%S.log"
log_file_mode: "0644"
log_rotation_age: "0"
log_rotation_size: "1GB"
log_truncate_on_rotation: "off"
log_min_duration_statement: "120000"
log_line_prefix: "%t %a [%p] "
log_timezone: "America/Denver"
Expand All @@ -87,7 +103,7 @@ spec:

# Lock management
max_locks_per_transaction: "1024"

pg_hba:
# Local connections with md5 authentication
- local all root md5
Expand All @@ -107,4 +123,7 @@ spec:
- host replication all 127.0.0.1/32 md5

# Remote replication
- host replication all 128.117.0.0/16 trust
- host replication all 128.117.0.0/16 trust

# Remote replication from Kubernetes pod network
- host replication all 10.0.0.0/16 trust
5 changes: 2 additions & 3 deletions pgdb01-cirrus/templates/su_external_secret.yaml
Original file line number Diff line number Diff line change
@@ -1,16 +1,15 @@
apiVersion: external-secrets.io/v1beta1
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
name: {{ .Values.db.name }}-superuser-esos
namespace: {{ .Release.Namespace }}
spec:
refreshInterval: 1h
secretStoreRef:
name: user-ro
name: rda-ro
kind: SecretStore
target:
name: {{ .Values.db.name }}-superuser
type: kubernetes.io/basic-auth
data:
- secretKey: username
remoteRef:
Expand Down
15 changes: 12 additions & 3 deletions pgdb01-cirrus/values.yaml
Original file line number Diff line number Diff line change
@@ -1,9 +1,18 @@
db:
name: pgdb01
group: pgdb01
instances: 3
size: 5000Gi
instances: 2
size: 8000Gi
superUser:
usernameKey: username
passwordKey: password
secretPath: gdex/pgdb01
secretPath: gdex/pgdb01
resource:
limits:
cpu: '16'
memory: 128Gi

backups:
enabled: false
s3:
enabled: false
34 changes: 34 additions & 0 deletions pgdb02-cirrus/templates/alert-email.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
name: gdex-app-team
namespace: rda
labels:
alertmanagerConfig: gdex
namespace: rda
spec:
route:
receiver: gdex-app-team
groupBy:
- alertname
groupWait: 10s
groupInterval: 1m
repeatInterval: 60m
matchers:
- name: namespace
value: rda
matchType: "="
routes:
- receiver: "null"
matchers:
- name: alertname
value: InfoInhibitor
matchType: "="

receivers:
- name: gdex-app-team
emailConfigs:
- to: decs-info@ucar.edu
from: alertmanager@k8s.ucar.edu
smarthost: vdir.ucar.edu:25
- name: "null"
48 changes: 48 additions & 0 deletions pgdb02-cirrus/templates/alert-rule.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: gdex-pg-replication-alerts
namespace: rda
labels:
team: gdex-app-team
release: kube-prometheus-stack
spec:
groups:
- name: pg.replication
interval: 60s
rules:
- alert: PGReplicationLagHigh
expr: |
cnpg_pg_replication_lag{namespace="rda"} > 100
for: 15m
labels:
severity: warning
team: gdex-app-team
namespace: rda
annotations:
summary: "PostgresDB replication lag high on {{`{{ $labels.pod }}`}}"
description: "Replication lag is {{`{{ $value }}`}} WAL segments behind on {{`{{ $labels.pod }}`}} in cluster {{`{{ $labels.cluster }}`}}"

- alert: PGReplicationBroken
expr: |
cnpg_pg_replication_streaming_replicas{namespace="rda", cluster="pgdb03"} == 0
for: 5m
labels:
severity: critical
team: gdex-app-team
namespace: rda
annotations:
summary: "PostgresDB cluster replication broken for {{`{{ $labels.cluster }}`}}"
description: "Cluster {{`{{ $labels.cluster }}`}} has no streaming replicas. Replication may be broken."

- alert: PGClusterNotHealthy
expr: |
cnpg_collector_up{namespace="rda"} == 0
for: 5m
labels:
severity: critical
team: gdex-app-team
namespace: rda
annotations:
summary: "PostgresDB cluster {{`{{ $labels.cluster }}`}} is not healthy"
description: "The Postgres exporter for cluster {{`{{ $labels.cluster }}`}} is down, indicating cluster health issues"
40 changes: 40 additions & 0 deletions pgdb02-cirrus/templates/backup_compressor.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
{{- if .Values.db.backups.s3.enabled }}
apiVersion: batch/v1
kind: CronJob
metadata:
name: {{ .Values.db.name }}-s3-backup-compressor
spec:
schedule: "0 12 * * *"
jobTemplate:
spec:
template:
spec:
restartPolicy: OnFailure
containers:
- name: compressor
image: hub.k8s.ucar.edu/khrpcek/backup-compressor:kmh
command: ["/multi_compress.sh"]
imagePullPolicy: Always
env:
- name: BASEDIR
value: "{{ .Values.db.backups.s3.destinationPath }}/{{ .Values.db.name }}/base/"
- name: AWS_ENDPOINT_URL
value: "{{ .Values.db.backups.s3.endpointURL }}"
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: {{ .Values.db.backups.s3.secretName }}
key: access_key
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: {{ .Values.db.backups.s3.secretName }}
key: secret_key
resources:
requests:
memory: 12Gi
cpu: 30
limits:
memory: 16Gi
cpu: 32
{{- end }}
Loading