OpenAI-compatible API

Alexandria exposes a subset of the OpenAI API shape so that tools built against the OpenAI SDK can be routed through Alexandria. The surface is agent-centric: model names an Alexandria agent, not a raw backend.

All routes require a human access token.

GET /v1/models

Lists agents the caller can invoke, in OpenAI model shape. Standard OAI fields are present; Alexandria adds additive fields that standard OAI parsers will ignore.

Field	Source
`id`	agent name (canonical handle for `model` in chat completions)
`object`	`"model"`
`created`	unix timestamp of agent creation
`owned_by`	`"alexandria"`
`agent`	`true` (always — distinguishes from backend entries elsewhere)
`backend`	the backend the agent is pinned to, or `null`
`underlying_model`	the model name the agent passes through, or `null`
`capability_role`	the agent's role string

Response 200

{
  "object": "list",
  "data": [
    {
      "id": "researcher",
      "object": "model",
      "created": 1715000000,
      "owned_by": "alexandria",
      "agent": true,
      "backend": "anthropic",
      "underlying_model": "claude-sonnet-4-6",
      "capability_role": "default"
    }
  ]
}

For backend introspection (admin UI, internal tooling), use GET /v1/backends instead.

GET /v1/backends

Lists the registered LLM and .amodel backends. Same body shape that /v1/models used to return.

Field	Source
`id`	backend name
`object`	`"model"`
`owned_by`	`"alexandria"`
`capabilities`	array of capabilities (`"chat"`, `"embed"`, etc.)
`protocol`	wire protocol
`kind`	backend kind (`llama-cpp`, `open-ai`, `claude`, etc.)
`category`	`"registered"` or `"installed"`
`providers`	provider sub-variants (for `.amodel` backends)

Use this when you need backend-level metadata; use /v1/models to discover what an OpenAI client can invoke.

POST /v1/chat/completions

OpenAI-compatible chat completions. The model field resolves in this order:

Strip workflow/ prefix silently if present (back-compat — bare agent name is now canonical).
Match agent by name → invoke as that agent. Audit row: agent.invoke. This is the supported path.
Match backend by name → wrap in a synthetic __passthrough__ agent and forward. Response carries Warning: 299 - "Deprecated: invoke agents by name; raw backend prompting will be removed". Audit row: agent.invoke with via_passthrough: true. Soft-deprecated; this path will be removed once telemetry shows zero usage.
Neither found → 404 with {"error": {"message": "model '<x>' not found ...", "type": "not_found"}}.

Empty model → 400.

Session header: set X-Alexandria-Session: <session-id> (or session_id/conversation_id in the body) to control conversation continuity and cache affinity. Defaults to principal:<jwt-subject>.

Request (non-streaming)

{
  "model": "researcher",
  "messages": [
    { "role": "user", "content": "What is 2+2?" }
  ],
  "stream": false
}

Response 200 (non-streaming)

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1715000000,
  "model": "researcher",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "4" },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 0,
    "completion_tokens": 0,
    "total_tokens": 0
  }
}

Note: Token usage counts are not yet populated (reported as 0). This is a known gap.

Streaming request

{
  "model": "researcher",
  "messages": [{ "role": "user", "content": "Tell me a story." }],
  "stream": true
}

Streaming returns SSE chunks in the OpenAI chat.completion.chunk format:

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1715000000,"model":"researcher","choices":[{"index":0,"delta":{"content":"Once"},"finish_reason":null}]}

data: [DONE]

curl example

# Non-streaming, by agent name
curl -s -X POST http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer $ACCESS" \
  -H 'Content-Type: application/json' \
  -H 'X-Alexandria-Session: my-session' \
  -d '{"model":"researcher","messages":[{"role":"user","content":"Hello"}]}' | jq .

# Streaming
curl -s -X POST http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer $ACCESS" \
  -H 'Content-Type: application/json' \
  -d '{"model":"researcher","messages":[{"role":"user","content":"Hello"}],"stream":true}' \
  --no-buffer

Deprecation: raw backend prompting

POST /v1/chat/completions historically accepted a backend name in model (e.g. claude-sonnet-4-6). That path still works — it routes through a synthetic __passthrough__ agent — but emits a Warning: 299 response header and a via_passthrough: true audit row. Migrate clients to invoke a named agent instead. The fallback will be removed in a future release.

POST /v1/embeddings

OpenAI-compatible embeddings. Backend-addressable — embeddings are not routed through agents. Emits a model.embed audit entry per call.

Backend resolution:

If model is set and matches a registered backend by name → use that backend.
Otherwise → auto-select first backend with "embed" capability (prefer is_default = true).
If the resolved backend lacks "embed" capability → 400.
If no embed backend is registered → 503.

Request

{
  "input": "Hello world",
  "model": "text-embed-3"
}

Or an array:

{
  "input": ["sentence one", "sentence two"]
}

Response 200

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "embedding": [0.012, -0.034, 0.078, "..."],
      "index": 0
    }
  ],
  "model": "text-embed-3"
}

Errors

400 — backend lacks embed capability
503 — no embedding backend registered
502 — upstream embedding request failed

curl -s -X POST http://localhost:8080/v1/embeddings \
  -H "Authorization: Bearer $ACCESS" \
  -H 'Content-Type: application/json' \
  -d '{"input":"Hello world"}' | jq '.data[0].embedding | length'

GET /v1/models​

GET /v1/backends​

POST /v1/chat/completions​

Deprecation: raw backend prompting​

POST /v1/embeddings​

GET /v1/models

GET /v1/backends

POST /v1/chat/completions

Deprecation: raw backend prompting

POST /v1/embeddings