Skip to main content

OpenAI-compatible API

Alexandria exposes a subset of the OpenAI API shape so that tools built against the OpenAI SDK can be routed through Alexandria. The surface is agent-centric: model names an Alexandria agent, not a raw backend.

All routes require a human access token.


GET /v1/models

Lists agents the caller can invoke, in OpenAI model shape. Standard OAI fields are present; Alexandria adds additive fields that standard OAI parsers will ignore.

FieldSource
idagent name (canonical handle for model in chat completions)
object"model"
createdunix timestamp of agent creation
owned_by"alexandria"
agenttrue (always — distinguishes from backend entries elsewhere)
backendthe backend the agent is pinned to, or null
underlying_modelthe model name the agent passes through, or null
capability_rolethe agent's role string

Response 200

{
"object": "list",
"data": [
{
"id": "researcher",
"object": "model",
"created": 1715000000,
"owned_by": "alexandria",
"agent": true,
"backend": "anthropic",
"underlying_model": "claude-sonnet-4-6",
"capability_role": "default"
}
]
}

For backend introspection (admin UI, internal tooling), use GET /v1/backends instead.


GET /v1/backends

Lists the registered LLM and .amodel backends. Same body shape that /v1/models used to return.

FieldSource
idbackend name
object"model"
owned_by"alexandria"
capabilitiesarray of capabilities ("chat", "embed", etc.)
protocolwire protocol
kindbackend kind (llama-cpp, open-ai, claude, etc.)
category"registered" or "installed"
providersprovider sub-variants (for .amodel backends)

Use this when you need backend-level metadata; use /v1/models to discover what an OpenAI client can invoke.


POST /v1/chat/completions

OpenAI-compatible chat completions. The model field resolves in this order:

  1. Strip workflow/ prefix silently if present (back-compat — bare agent name is now canonical).
  2. Match agent by name → invoke as that agent. Audit row: agent.invoke. This is the supported path.
  3. Match backend by name → wrap in a synthetic __passthrough__ agent and forward. Response carries Warning: 299 - "Deprecated: invoke agents by name; raw backend prompting will be removed". Audit row: agent.invoke with via_passthrough: true. Soft-deprecated; this path will be removed once telemetry shows zero usage.
  4. Neither found404 with {"error": {"message": "model '<x>' not found ...", "type": "not_found"}}.

Empty model400.

Session header: set X-Alexandria-Session: <session-id> (or session_id/conversation_id in the body) to control conversation continuity and cache affinity. Defaults to principal:<jwt-subject>.

Request (non-streaming)

{
"model": "researcher",
"messages": [
{ "role": "user", "content": "What is 2+2?" }
],
"stream": false
}

Response 200 (non-streaming)

{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1715000000,
"model": "researcher",
"choices": [
{
"index": 0,
"message": { "role": "assistant", "content": "4" },
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 0,
"completion_tokens": 0,
"total_tokens": 0
}
}

Note: Token usage counts are not yet populated (reported as 0). This is a known gap.

Streaming request

{
"model": "researcher",
"messages": [{ "role": "user", "content": "Tell me a story." }],
"stream": true
}

Streaming returns SSE chunks in the OpenAI chat.completion.chunk format:

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1715000000,"model":"researcher","choices":[{"index":0,"delta":{"content":"Once"},"finish_reason":null}]}

data: [DONE]

curl example

# Non-streaming, by agent name
curl -s -X POST http://localhost:8080/v1/chat/completions \
-H "Authorization: Bearer $ACCESS" \
-H 'Content-Type: application/json' \
-H 'X-Alexandria-Session: my-session' \
-d '{"model":"researcher","messages":[{"role":"user","content":"Hello"}]}' | jq .

# Streaming
curl -s -X POST http://localhost:8080/v1/chat/completions \
-H "Authorization: Bearer $ACCESS" \
-H 'Content-Type: application/json' \
-d '{"model":"researcher","messages":[{"role":"user","content":"Hello"}],"stream":true}' \
--no-buffer

Deprecation: raw backend prompting

POST /v1/chat/completions historically accepted a backend name in model (e.g. claude-sonnet-4-6). That path still works — it routes through a synthetic __passthrough__ agent — but emits a Warning: 299 response header and a via_passthrough: true audit row. Migrate clients to invoke a named agent instead. The fallback will be removed in a future release.


POST /v1/embeddings

OpenAI-compatible embeddings. Backend-addressable — embeddings are not routed through agents. Emits a model.embed audit entry per call.

Backend resolution:

  1. If model is set and matches a registered backend by name → use that backend.
  2. Otherwise → auto-select first backend with "embed" capability (prefer is_default = true).
  3. If the resolved backend lacks "embed" capability → 400.
  4. If no embed backend is registered → 503.

Request

{
"input": "Hello world",
"model": "text-embed-3"
}

Or an array:

{
"input": ["sentence one", "sentence two"]
}

Response 200

{
"object": "list",
"data": [
{
"object": "embedding",
"embedding": [0.012, -0.034, 0.078, "..."],
"index": 0
}
],
"model": "text-embed-3"
}

Errors

  • 400 — backend lacks embed capability
  • 503 — no embedding backend registered
  • 502 — upstream embedding request failed
curl -s -X POST http://localhost:8080/v1/embeddings \
-H "Authorization: Bearer $ACCESS" \
-H 'Content-Type: application/json' \
-d '{"input":"Hello world"}' | jq '.data[0].embedding | length'