Skip to main content

Runbook: Helm Upgrade

When to use this

Use this runbook whenever you are upgrading the Alexandria EE Helm release to a new chart version or image tag. Covers the pre-upgrade snapshot, the upgrade command, rollout verification, and rollback path.

Pre-checks

  • Confirm the target image tag exists in Artifact Registry before upgrading.
  • Record the current release revision: helm history alexandria-ee -n <namespace> --max 5
  • Verify pods are currently healthy: kubectl get pods -n <namespace> -l app.kubernetes.io/instance=alexandria-ee
  • Check that the Postgres DSN secret is present and reachable from the cluster.
  • If upgrading across a minor that touches the DB schema, read the release notes for migration notes — api-go/internal/db/migrations.go::ApplySchema runs automatically on startup, but destructive migrations require manual intervention.
  • Take a Postgres snapshot before proceeding (see backup-restore.md).

Procedure

  1. Pull the new chart version or update your local chart directory.

  2. Identify the tier values file for your deployment (values-starter.yaml, values-professional.yaml, or values-enterprise.yaml).

  3. Pin the image tags in your site-specific values file. Never rely on "" (latest) in production:

    image:
    api:
    tag: "v1.2.3"
    orchestrator:
    tag: "v1.2.3"
    dashboard:
    tag: "v1.2.3"
  4. Run the upgrade with --atomic so Helm auto-rolls back on timeout:

    helm upgrade alexandria-ee k8s/helm/alexandria-ee/ \
    -n <namespace> \
    -f k8s/helm/alexandria-ee/values.yaml \
    -f k8s/helm/alexandria-ee/values-<tier>.yaml \
    -f /path/to/site-values.yaml \
    --atomic \
    --timeout 10m
  5. Watch pod rollout in a second terminal:

    kubectl rollout status deployment/alexandria-ee -n <namespace> --timeout=10m

Verification

  • Health endpoint: curl -sf https://<host>/ready should return 200 OK.
  • Setup screen (first install only): curl -sf https://<host>/auth/setup returns 200 if no admin exists yet.
  • License endpoint: curl -sf https://<host>/license — confirm tier, current_seats, expires_at fields look correct.
  • Check pod logs for startup errors: kubectl logs -n <namespace> -l app.kubernetes.io/instance=alexandria-ee -c api --tail=50

Rollback

If the upgrade fails and --atomic did not trigger (e.g., you omitted it):

# See available revisions
helm history alexandria-ee -n <namespace>

# Roll back to the previous revision
helm rollback alexandria-ee <revision> -n <namespace> --timeout 5m

After rollback, re-verify the /ready and /license endpoints. If the schema migration ran and is not reversible, restore from the pre-upgrade Postgres snapshot — see backup-restore.md.