Runbook: Tool-Approval Gating — End-to-End Smoke

When to use this

Use this runbook to validate the per-call human approval gate for MCP tool calls end to end after a deploy. It exercises the full flow: agent token mint with approval_required_tools, dispatch through POST /mcp/agents/{name}, the approve and reject branches, idempotency cache, Prometheus gauges, and the Alertmanager notification path.

The feature ships across these merged commits:

9ce81b8 — schema + agent store for approval_required_tools
487d328 — idempotency-keyed permission_requests + execution cache
6b55f0e — bake ApprovalRequiredTools into agent JWT
1f46157 — Alertmanager notifier + composite + reconciler + admin API
1288959 — MCP dispatcher per-call approval gate + Prometheus gauges + NotifyPending
5d82a68 — frontend Notifications settings (URLs + severity + test button)

Prereqs

kubectl context must point at the EE cluster:

kubectl config current-context
# expect: gke_<project>_<region>_alexandria-...

Port-forward the API service. Use the alex-pf skill or run directly. (Stale port-forwards silently break after pod redeploys — kill them first.)
```
pkill -f "kubectl.*port-forward.*alexandria" || true
kubectl -n alexandria port-forward svc/alexandria-ee 8080:80 &
```
Local base URL is then http://127.0.0.1:8080.

Admin password. Read from the bootstrap secret (do not check it into logs or paste it into chat):

kubectl -n alexandria get secret alexandria-ee-auth \
  -o jsonpath='{.data.admin-password}' | base64 -d

Export it locally for the rest of the runbook:

export ALEX_PASS='<paste-password>'
export ALEX_URL='http://127.0.0.1:8080'

A real MCP server must be registered and an agent must exist whose allowed_tools include the tool we will gate. Examples in this runbook use agent=demo-agent and tool=write_file; substitute the agent / tool names that fit your deployment.

ACCESS_TOKEN=$(curl -sS -X POST "$ALEX_URL/auth/login" \
  -H 'Content-Type: application/json' \
  -d "$(jq -nc --arg p "$ALEX_PASS" '{username:"admin", password:$p}')" \
  | jq -r '.access_token')
echo "${ACCESS_TOKEN:0:20}..."

Response shape (full body):

{
  "access_token": "eyJ...",
  "refresh_token": "...",
  "token_type": "Bearer",
  "expires_in": 900
}

access_token must be a super_admin token — approve/reject and the approval-required-tools route both require role super_admin.

Step 2 — Set `approval_required_tools` on the agent

The dedicated admin route lives on main:

PUT /admin/agents/{name}/approval-required-tools

It accepts { "tools": [...] }, replaces the list, bumps permissions_version (so any existing agent JWT is invalidated and must be re-minted), and returns the updated agent DTO. Super_admin only.

curl -sS -X PUT "$ALEX_URL/admin/agents/demo-agent/approval-required-tools" \
  -H "Authorization: Bearer $ACCESS_TOKEN" \
  -H 'Content-Type: application/json' \
  -d '{"tools": ["write_file"]}' | jq

Expect HTTP 200 with the agent DTO; approval_required_tools in the response body should equal ["write_file"].

If you are running against a build that pre-dates the route landing, mint a fresh agent and pass approval_required_tools to the create route, or use the SQL fallback documented under the feature MR. The dedicated PUT route is the supported path going forward.

Step 3 — Mint an agent token

AGENT_TOKEN=$(curl -sS -X POST "$ALEX_URL/v1/agent-token" \
  -H "Authorization: Bearer $ACCESS_TOKEN" \
  -H 'Content-Type: application/json' \
  -d '{"agent": "demo-agent"}' \
  | jq -r '.token')
echo "${AGENT_TOKEN:0:20}..."

Response shape:

{
  "token": "eyJ...",
  "agent": "demo-agent",
  "effective_tools": ["write_file", "..."],
  "expires_in": 3600
}

The agent JWT bakes in both effective_tools and approval_required_tools at mint time (see api-go/internal/routes/auth.go::handleAgentToken). If the agent token was minted before Step 2, it carries the old approval list — always mint after changing policy.

Step 4 — Dispatch a gated `tools/call`

The MCP dispatch endpoint is POST /mcp/agents/{name} (agent-scoped) or POST /mcp (unscoped). Both extract X-Idempotency-Key from the request header and stash it on the Caller struct (see api-go/internal/routes/mcp_routes.go::handleMCPAgentDispatch and internal/mcp/dispatch.go::Caller.IdempotencyKey). For the gating flow we use a stable, caller-chosen UUID so a retry hits the same row.

IDEMPOTENCY_KEY=$(uuidgen)
RESP1=$(curl -sS -X POST "$ALEX_URL/mcp/agents/demo-agent" \
  -H "Authorization: Bearer $AGENT_TOKEN" \
  -H "X-Idempotency-Key: $IDEMPOTENCY_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
    "jsonrpc": "2.0",
    "id": 1,
    "method": "tools/call",
    "params": {
      "name": "write_file",
      "arguments": {"path": "/tmp/smoke.txt", "content": "hello"}
    }
  }')
echo "$RESP1" | jq
REQUEST_ID=$(echo "$RESP1" | jq -r '.result.request_id')
echo "REQUEST_ID=$REQUEST_ID"

Expected response body (after the 30s in-band poll cap expires):

{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "pending": true,
    "request_id": "<uuid>",
    "retry_after_ms": 2000
  }
}

The dispatcher creates one permission_requests row keyed by the idempotency key (store.UpsertPendingPermissionRequest) and waits up to 30s for a terminal status before returning the pending envelope (see internal/mcp/dispatch.go::handleApprovalGate and pollApproval). It also fires NotifyPending against the configured notifier (webhook +/or Alertmanager) and increments the created event counter.

Step 5 — Approve the request

In a new shell (or after the pending response returns), approve as super_admin:

curl -sS -X POST "$ALEX_URL/admin/permissions/$REQUEST_ID/approve" \
  -H "Authorization: Bearer $ACCESS_TOKEN" | jq

Response is the full permission request DTO with "status": "approved", reviewed_by, and reviewed_at set. The approve handler decrements the pending gauge and emits an approved event (see routes/permissions_routes.go::handleApprovePermissionRequest).

Step 6 — Retry with the same idempotency key

Repeat the exact same dispatch from Step 4 (same X-Idempotency-Key):

RESP2=$(curl -sS -X POST "$ALEX_URL/mcp/agents/demo-agent" \
  -H "Authorization: Bearer $AGENT_TOKEN" \
  -H "X-Idempotency-Key: $IDEMPOTENCY_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
    "jsonrpc": "2.0",
    "id": 2,
    "method": "tools/call",
    "params": {
      "name": "write_file",
      "arguments": {"path": "/tmp/smoke.txt", "content": "hello"}
    }
  }')
echo "$RESP2" | jq

Expected: the dispatcher reads the row, sees status=approved, wins or loses the at-most-once execute race in MarkPermissionRequestExecuted, then either:

executes the underlying MCP tools/call, caches the raw result on the row, and returns the actual tool result; or
if a concurrent call already executed, returns the cached result from SetPermissionRequestResult (see executeApproved in internal/mcp/dispatch.go).

Either way RESP2.result is the real tool result — not a pending envelope.

Step 7 — Metrics check

/metrics is public and unauthenticated (router.go line r.Get("/metrics", handleMetrics())).

curl -sS "$ALEX_URL/metrics" | grep -E '^alexandria_permission_request' | sort

Expected after one full create → approve → execute cycle:

alexandria_permission_request_pending{request_type="tool_call",tool="write_file"} 0
alexandria_permission_request_events_total{request_type="tool_call",tool="write_file",event="created"} >= 1
alexandria_permission_request_events_total{request_type="tool_call",tool="write_file",event="approved"} >= 1

Live trajectory of the pending gauge during the run: 0 → 1 on Step 4, back to 0 on Step 5 (decrement happens in the approve handler) and again confirmed on Step 6 by the executor path. The two events_total counters increment by 1 each per cycle. Metric and label names come from internal/mcp/dispatch.go::PermissionRequestPending and PermissionRequestEvents.

Step 8 — Reject path

Use a fresh idempotency key for a new request so the row is independent from Steps 4–6:

IDEMPOTENCY_KEY=$(uuidgen)
RESP3=$(curl -sS -X POST "$ALEX_URL/mcp/agents/demo-agent" \
  -H "Authorization: Bearer $AGENT_TOKEN" \
  -H "X-Idempotency-Key: $IDEMPOTENCY_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
    "jsonrpc": "2.0",
    "id": 3,
    "method": "tools/call",
    "params": {
      "name": "write_file",
      "arguments": {"path": "/tmp/smoke.txt", "content": "reject me"}
    }
  }')
REQUEST_ID=$(echo "$RESP3" | jq -r '.result.request_id')

curl -sS -X POST "$ALEX_URL/admin/permissions/$REQUEST_ID/reject" \
  -H "Authorization: Bearer $ACCESS_TOKEN" \
  -H 'Content-Type: application/json' \
  -d '{"reason": "smoke test rejection"}' | jq

# Retry the call.
curl -sS -X POST "$ALEX_URL/mcp/agents/demo-agent" \
  -H "Authorization: Bearer $AGENT_TOKEN" \
  -H "X-Idempotency-Key: $IDEMPOTENCY_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
    "jsonrpc": "2.0",
    "id": 4,
    "method": "tools/call",
    "params": {
      "name": "write_file",
      "arguments": {"path": "/tmp/smoke.txt", "content": "reject me"}
    }
  }' | jq

Expected on the retry:

{
  "jsonrpc": "2.0",
  "id": 4,
  "error": {
    "code": -32003,
    "message": "denied: smoke test rejection"
  }
}

-32003 is ErrAccessDenied from internal/mcp/dispatch.go. The reason string is read out of notifier_meta via extractReason.

/metrics should now also show alexandria_permission_request_events_total{...event="rejected"} >= 1 and the pending gauge should still be 0.

Step 9 — Alertmanager notification

Configure the Alertmanager endpoint from the Settings → Notifications UI or via the admin API directly:

curl -sS -X PATCH "$ALEX_URL/admin/notifications/config" \
  -H "Authorization: Bearer $ACCESS_TOKEN" \
  -H 'Content-Type: application/json' \
  -d '{
    "alertmanager_url": "http://alertmanager.monitoring.svc:9093",
    "alertmanager_severity": "warning"
  }' | jq

Fire a synthetic alert and resolved-alert pair:

curl -sS -X POST "$ALEX_URL/admin/notifications/test" \
  -H "Authorization: Bearer $ACCESS_TOKEN" | jq

Expected response {"ok": true}. The handler calls NotifyPending and then NotifyReviewed against the active composite notifier (routes/notifications_routes.go::handleTestNotifications), so a properly configured Alertmanager will see a firing alert immediately followed by a resolved alert. Both URL save and test live behind the super_admin role check.

Frontend equivalent: log in as super_admin, open Settings → Notifications, paste the Alertmanager URL, click Save, then click Test. Frontend API client wraps the same endpoints (getNotificationsConfig, patchNotificationsConfig, testNotifications in frontend/src/api.ts).

Cleanup

Reset the approval list back to empty so the agent does not require approval for normal operation:

curl -sS -X PUT "$ALEX_URL/admin/agents/demo-agent/approval-required-tools" \
  -H "Authorization: Bearer $ACCESS_TOKEN" \
  -H 'Content-Type: application/json' \
  -d '{"tools": []}' | jq

pkill -f "kubectl.*port-forward.*alexandria" if you want to drop the port-forward.

Placeholders used in this runbook

<ACCESS_TOKEN> — admin access token from Step 1 ($ACCESS_TOKEN).
<AGENT_TOKEN> — agent JWT from Step 3 ($AGENT_TOKEN).
<REQUEST_ID> — result.request_id from the pending response ($REQUEST_ID).
<IDEMPOTENCY_KEY> — caller-chosen UUID, stable across retries ($IDEMPOTENCY_KEY).

Related routes (verified against `routes/router.go` on `main`)

Method	Path	Auth	Source
`POST`	`/auth/login`	none	`routes/auth.go::handleLogin`
`POST`	`/v1/agent-token`	human access	`routes/auth.go::handleAgentToken`
`POST`	`/mcp/agents/{name}`	bearer (agent or user)	`routes/mcp_routes.go::handleMCPAgentDispatch`
`POST`	`/mcp`	bearer	`routes/mcp_routes.go::handleMCPDispatch`
`PUT`	`/admin/agents/{name}/approval-required-tools`	super_admin	`routes/agents.go::handleSetApprovalRequiredTools`
`GET`	`/admin/permissions[?status=pending]`	admin	`routes/permissions_routes.go::handleListPermissionRequests`
`GET`	`/admin/permissions/{id}`	admin	`routes/permissions_routes.go::handleGetPermissionRequest`
`POST`	`/admin/permissions/{id}/approve`	admin	`routes/permissions_routes.go::handleApprovePermissionRequest`
`POST`	`/admin/permissions/{id}/reject`	admin	`routes/permissions_routes.go::handleRejectPermissionRequest`
`PATCH`	`/admin/notifications/config`	super_admin	`routes/notifications_routes.go::handlePatchNotificationsConfig`
`POST`	`/admin/notifications/test`	super_admin	`routes/notifications_routes.go::handleTestNotifications`
`GET`	`/metrics`	none	`routes/metrics.go::handleMetrics`

Verified flow logically against code in commit 78fd009e36281d90eb43e2162c47a98791e4ddf4; end-to-end live test pending.

When to use this​

Prereqs​

Step 1 — Login (super_admin access token)​

Step 2 — Set approval_required_tools on the agent​

Step 3 — Mint an agent token​

Step 4 — Dispatch a gated tools/call​

Step 5 — Approve the request​

Step 6 — Retry with the same idempotency key​

Step 7 — Metrics check​

Step 8 — Reject path​

Step 9 — Alertmanager notification​

Cleanup​

Placeholders used in this runbook​

Related routes (verified against routes/router.go on main)​