Runbook: Postgres Backup and Restore
When to use this
Use this runbook to take a point-in-time snapshot of the Alexandria EE Postgres database before upgrades, as part of a scheduled backup policy, or to restore from a snapshot after data loss or a failed migration.
Postgres is always external to the cluster — there is no in-cluster database pod. All data is in the external Postgres instance referenced by the DSN secret.
Pre-checks
- Locate the database DSN. It is stored in one of two places:
- Kubernetes Secret:
kubectl get secret alexandria-ee -n <namespace> -o jsonpath='{.data.database-dsn}' | base64 -d - Vault KV v2 (when
vault.enabled=true): check the path undervault.prefix(default:alexandria/llm/) in your site values.
- Kubernetes Secret:
- Confirm the backup destination has enough disk space (
pg_dumpoutput is typically 10–50% of raw table size). - If running
alex backup(Quadlet/CLI installs): this backs up config files, tools, and the knowledge store — it does NOT back up the Postgres database. Runpg_dumpseparately for the DB.
Procedure
Taking a backup
# Export the DSN from the k8s Secret
DSN=$(kubectl get secret alexandria-ee -n <namespace> \
-o jsonpath='{.data.database-dsn}' | base64 -d)
# Snapshot to a timestamped file
TIMESTAMP=$(date +%Y%m%dT%H%M%S)
pg_dump "$DSN" \
--format=custom \
--compress=9 \
--file="alexandria-ee-${TIMESTAMP}.pgdump"
Store the .pgdump file in durable storage (GCS bucket, S3, encrypted volume) before proceeding with any upgrade or migration.
CLI backup (config + knowledge store, not DB)
# On a Quadlet/CLI node — backs up config, tools, knowledge store to a tar.gz
alexandria backup [--output ./alexandria-backup-$(date +%Y-%m-%d).tar.gz]
This produces a MANIFEST.json-anchored archive. It does not include Postgres data.
Restoring from a Postgres snapshot
-
Stop or scale down Alexandria pods to prevent writes during restore:
kubectl scale deployment/alexandria-ee -n <namespace> --replicas=0 -
Restore into the target database (drop and recreate if needed):
DSN=$(kubectl get secret alexandria-ee -n <namespace> \-o jsonpath='{.data.database-dsn}' | base64 -d)# Drop and recreate the schema (destructive — confirm first)psql "$DSN" -c "DROP SCHEMA public CASCADE; CREATE SCHEMA public;"pg_restore "$DSN" \--format=custom \--no-owner \--no-privileges \alexandria-ee-<timestamp>.pgdump -
Scale pods back up:
kubectl scale deployment/alexandria-ee -n <namespace> --replicas=<count> -
Verify audit chain integrity after restore. The audit HMAC chain must be re-checked because a partial restore or replay could produce a broken chain:
- There is no standalone
alex-cli audit verifycommand today. Audit chain verification is performed internally by the Go API viaRetireUnverifiableAuditRowsinapi-go/internal/store/audit.go. - Manual step: after restore, check the API logs at startup for any audit integrity warnings. If your compliance policy requires a full chain re-verification, query the
audit_logstable and re-run the HMAC chain computation against the restored rows before returning the system to production. - If a future
alex-cli audit verifycommand is added, run it here.
- There is no standalone
Verification
- Pods reach
Readystate:kubectl get pods -n <namespace> /readyreturns 200:curl -sf https://<host>/ready- Spot-check that a known record exists:
curl -sf https://<host>/licenseshould reflect the correct license metadata.
Rollback
If the restore is incomplete or the chain is broken, restore from a different snapshot. Do not attempt to patch audit rows manually — capture the affected rows for compliance review first, then consider RetireUnverifiableAuditRows (see incident-response.md — it is destructive).